Empowering LLM-based Agents: Methods and Challenges in Tool Use

This paper surveys the methods and challenges of equipping LLM-based agents with tool-use capabilities, synthesizing the state of the art from foundational architectures to advanced strategies like dynamic tool retrieval and autonomous tool creation. It identifies critical unsolved challenges including knowledge conflicts, long-context degradation, non-monotonic scaling, and security vulnerabilities.

Problem Statement

LLM-based agents are constrained by static internal knowledge that cannot interact with live data or execute complex real-world computations. Tool use is the primary paradigm to overcome these limitations, but the field lacks a unified understanding of best practices and failure modes. Existing work is fragmented across invocation mechanisms, retrieval strategies, and security considerations, making it difficult for practitioners to build robust deployments.

Key Novelty

Systematic taxonomy of tool-use methods ranging from function calling and core invocation mechanisms to dynamic tool retrieval and autonomous tool creation
Identification and characterization of four critical challenge categories: knowledge conflicts between internal priors and external evidence, long-context performance degradation, non-monotonic scaling in compound systems, and novel security vulnerabilities
A structured research agenda mapping open problems and future directions for building more capable, secure, and reliable tool-augmented LLM agents

Evaluation Highlights

Qualitative synthesis across the existing literature on tool-augmented LLM agents, covering benchmarks and architectures from foundational to cutting-edge systems
Identification of non-monotonic scaling behaviors in compound agent systems, where adding more tools or agents does not guarantee monotonic performance improvement

Breakthrough Assessment

4/10 This is a well-scoped survey paper that provides valuable synthesis and a clear research agenda, but as a literature review it does not introduce new methods or empirical results. Its contribution is organizational and directional rather than technically novel.

Methodology

Systematic literature review: collected and categorized existing research on LLM-based agent tool use, covering architectures, invocation mechanisms, retrieval strategies, and creation paradigms
Challenge analysis: identified recurring failure modes across surveyed works, grouping them into knowledge conflicts, long-context issues, scaling anomalies, and security vulnerabilities
Research agenda formulation: mapped identified gaps to concrete future research directions to guide development of robust, secure tool-using agents

System Components

Foundational Agent Architecture

Core LLM-based agent design patterns including planning, memory, and action modules that underpin tool-use capability

Function Calling / Core Invocation

Mechanisms by which LLMs select, parameterize, and execute external tools or APIs in a structured, reliable manner

Dynamic Tool Retrieval

Strategies for selecting relevant tools at inference time from large tool libraries rather than relying on a fixed toolset

Autonomous Tool Creation

Methods enabling agents to generate or compose new tools on-the-fly to handle novel tasks not covered by existing tools

Knowledge Conflict Handling

Techniques to reconcile discrepancies between an LLM's internal parametric knowledge and up-to-date external tool outputs

Security Vulnerability Analysis

Characterization of attack surfaces introduced by tool use, including prompt injection, malicious tool responses, and adversarial tool selection

Results

Challenge/Dimension	Status Without Tool Use	Status With Tool Use	Remaining Gap
Dynamic knowledge access	Impossible (static parameters)	Enabled via API/search tools	Knowledge conflict resolution
Complex computation	Unreliable (hallucinated math)	Reliable via code/calculator tools	Long-context orchestration failures
Scalability with more tools	N/A	Non-monotonic improvement observed	Compound system scaling laws unresolved
Security posture	Standard LLM risks	New attack surfaces introduced	Tool-specific defenses immature

Key Takeaways

Practitioners should implement explicit knowledge conflict resolution strategies when tool outputs contradict the LLM's internal priors, as naive tool integration can degrade reliability rather than improve it
When designing compound agent systems, adding more tools or sub-agents does not reliably improve performance—non-monotonic scaling means careful ablation and tool selection policies are essential
Security must be a first-class concern in tool-augmented deployments: prompt injection via tool responses and adversarial tool selection are realistic threats that require sandboxing, output validation, and trust hierarchies

Abstract

The emergence of Large Language Model (LLM)-based agents marks a significant step towards more capable Artificial Intelligence. However, the effectiveness of these agents is fundamentally constrained by the static nature of their internal knowledge. Tool use has become a critical paradigm to overcome these limitations, enabling agents to interact with dynamic data, execute complex computations, and act upon the world. This paper provides a comprehensive survey of the methods, challenges, and future directions in empowering LLM-based agents with tool-use capabilities. Through a systematic literature review, we synthesized the current state of the art, charting the evolution from foundational agent architectures and core invocation mechanisms like function calling to advanced strategies such as dynamic tool retrieval and autonomous tool creation. Our analysis revealed several critical challenges that impede the deployment of robust agents, including knowledge conflicts between internal priors and external evidence, significant performance degradation in long-context scenarios, non-monotonic scaling behaviors in compound systems, and novel security vulnerabilities. By mapping the current research landscape and identifying these key obstacles, this survey proposes a research agenda to guide future efforts in building more capable, secure, and reliable AI agents.