Dynamic Tool Dependency Retrieval for Efficient Function Calling

Dynamic Tool Dependency Retrieval (DTDR) improves LLM-based function calling agents by conditioning tool retrieval on both the initial query and the evolving execution context, capturing multi-step tool dependencies rather than relying on static inputs. This adaptive approach significantly reduces irrelevant tool selection and improves task completion success rates.

Problem Statement

Existing tool retrieval methods for LLM function-calling agents use static, query-only inputs that fail to capture how tool dependencies evolve across multi-step task execution. This leads to retrieval of irrelevant tools that pollute the agent's context, degrading both accuracy and efficiency. As on-device agents grow more common, scalable and precise retrieval is critical to avoid context length blowup and misleading tool suggestions.

Key Novelty

Dynamic retrieval that conditions on both the initial user query and the evolving execution context (previously called tools, intermediate results), rather than static query-only retrieval
Modeling of tool dependencies learned from function-calling demonstrations, enabling the retriever to understand which tools commonly co-occur or follow one another in multi-step plans
Systematic benchmarking of retrieval precision, downstream task accuracy, and computational efficiency across multiple datasets and LLM backbones, plus exploration of prompt integration strategies for retrieved tools

Evaluation Highlights

DTDR improves function calling success rates by 23% to 104% compared to state-of-the-art static retrieval methods across multiple datasets and LLM backbones
DTDR improves retrieval precision (relevant tools retrieved) while maintaining lightweight computational overhead suitable for on-device deployment

Breakthrough Assessment

6/10 DTDR addresses a real and underexplored bottleneck in agentic LLM pipelines with a well-motivated and practical solution, achieving large empirical gains. However, the core idea—contextual/dynamic retrieval conditioned on execution state—is a natural extension of existing RAG principles rather than a paradigm shift, placing it as a solid, impactful contribution.

Methodology

Mine function-calling demonstrations to extract tool co-occurrence and sequential dependency patterns, building a tool dependency graph or embedding space that captures which tools typically follow or accompany others
At inference time, at each agent step, dynamically re-query the retrieval module using both the original user query and the current execution context (e.g., tools already called, intermediate outputs, partial plan state) to adaptively surface the most relevant next tools
Integrate the dynamically retrieved tools into the LLM prompt using optimized strategies (e.g., ordering, formatting) and evaluate the full pipeline on multi-step function-calling benchmarks measuring retrieval precision, task success rate, and latency

System Components

Dynamic Context Encoder

Encodes both the initial user query and the evolving execution context (previously invoked tools, intermediate results) into a unified representation for retrieval at each agent step

Tool Dependency Modeler

Learns tool co-occurrence and sequential dependency patterns from function-calling demonstrations, enabling the retriever to anticipate which tools are likely needed given the current execution state

Adaptive Retrieval Module

Lightweight retriever that re-queries the tool index at each agent step using the dynamic context representation, selecting a relevant subset of tools to include in the LLM prompt

Prompt Integration Strategy

Techniques for inserting dynamically retrieved tools into the LLM context (ordering, formatting) to maximize agent performance while minimizing context length overhead

Results

Metric/Benchmark	Static Retriever (SOTA)	DTDR (This Paper)	Delta
Function Calling Success Rate (best dataset)	Baseline (lower bound)	+104% relative improvement	+104%
Function Calling Success Rate (worst dataset)	Baseline (lower bound)	+23% relative improvement	+23%
Retrieval Precision	Static query-only precision	Higher precision (fewer irrelevant tools)	Qualitative improvement
Computational Efficiency	Static retrieval latency	Lightweight; suitable for on-device use	Comparable or better

Key Takeaways

For multi-step agentic pipelines, tool retrieval should be re-executed at each step using the current execution context, not just the initial query—this alone can more than double task success rates in some settings
Modeling tool dependencies from demonstrations is a practical and lightweight way to improve retrieval relevance without requiring expensive retraining of the LLM backbone, making it deployable on constrained devices
How retrieved tools are integrated into the prompt (ordering, formatting) meaningfully affects agent performance, so prompt engineering for tool injection deserves explicit attention when building function-calling systems

Abstract

Function calling agents powered by Large Language Models (LLMs) select external tools to automate complex tasks. On-device agents typically use a retrieval module to select relevant tools, improving performance and reducing context length. However, existing retrieval methods rely on static and limited inputs, failing to capture multi-step tool dependencies and evolving task context. This limitation often introduces irrelevant tools that mislead the agent, degrading efficiency and accuracy. We propose Dynamic Tool Dependency Retrieval (DTDR), a lightweight retrieval method that conditions on both the initial query and the evolving execution context. DTDR models tool dependencies from function calling demonstrations, enabling adaptive retrieval as plans unfold. We benchmark DTDR against state-of-the-art retrieval methods across multiple datasets and LLM backbones, evaluating retrieval precision, downstream task accuracy, and computational efficiency. Additionally, we explore strategies to integrate retrieved tools into prompts. Our results show that dynamic tool retrieval improves function calling success rates between $23\%$ and $104\%$ compared to state-of-the-art static retrievers.