AutoTool: Efficient Tool Selection for Large Language Model Agents

AutoTool is a graph-based framework that models 'tool usage inertia' in LLM agent trajectories to predict sequential tool selections, drastically reducing the need for repeated LLM inference during tool selection. By encoding historical tool transition patterns into a directed probability graph, it achieves up to 30% inference cost reduction while maintaining competitive task completion rates.

Problem Statement

Current LLM agent frameworks like ReAct incur high inference costs because they invoke the LLM at every step to decide which tool to use next, making them expensive and slow at scale. This repeated LLM querying creates a bottleneck that limits the practical deployment of agents in cost-sensitive or latency-sensitive environments. There is no efficient mechanism to exploit the predictable, sequential nature of tool usage patterns observed in real agent trajectories.

Key Novelty

Discovery and formalization of 'tool usage inertia': the empirical observation that tool invocations follow predictable sequential patterns in agent trajectories
A directed graph structure built from historical trajectories where nodes are tools and weighted edges capture transition probabilities, enabling statistical tool selection without LLM calls
Integration of parameter-level information into the graph framework to refine tool input generation alongside tool selection

Evaluation Highlights

AutoTool reduces LLM inference costs by up to 30% compared to inference-heavy baselines like ReAct across diverse agent tasks
Task completion rates remain competitive with full LLM-based tool selection baselines, demonstrating the efficiency-performance tradeoff is favorable

Breakthrough Assessment

5/10 AutoTool presents a solid and practical engineering contribution by cleverly exploiting statistical patterns in agent behavior to reduce inference overhead, but it is an optimization of existing frameworks rather than a fundamental rethinking of LLM agent architecture. The 30% cost reduction is meaningful but the approach is largely heuristic and trajectory-dependent, limiting generalizability to novel or diverse tool ecosystems.

Methodology

Step 1 - Trajectory Mining: Collect historical agent execution trajectories and extract tool call sequences to identify sequential patterns and transition frequencies between tools
Step 2 - Graph Construction: Build a directed weighted graph where each node is a tool, edges represent observed tool-to-tool transitions, and edge weights encode empirical transition probabilities; also attach parameter-level templates or constraints to nodes
Step 3 - Inference-Time Tool Selection: At runtime, use the current tool context to traverse the graph probabilistically, selecting the next tool (and generating its parameters) based on transition probabilities rather than invoking the LLM at each step

System Components

Tool Transition Graph

A directed graph constructed from historical agent trajectories where nodes are tools and edges with probabilistic weights capture how frequently one tool follows another, encoding tool usage inertia

Parameter-Level Information Module

Augments the graph nodes with parameter templates and contextual constraints to guide input generation for selected tools without requiring a full LLM call

Graph Traversal Engine

The runtime component that uses the current agent state and last-used tool to traverse the transition graph and select the next tool with minimal or zero LLM invocations

Historical Trajectory Miner

A preprocessing pipeline that parses past agent execution logs to extract tool usage sequences used to build and weight the transition graph

Results

Metric/Benchmark	Baseline (ReAct)	AutoTool	Delta
LLM Inference Cost	High (1 LLM call/step)	Reduced	Up to -30%
Task Completion Rate	Competitive baseline	Competitive	Minimal degradation
Tool Selection Accuracy	LLM-driven (high quality)	Graph-driven (near-competitive)	Slight tradeoff for efficiency
Scalability	Degrades with more tools	Scales via graph structure	Improved

Key Takeaways

Practitioners deploying ReAct-style agents at scale can integrate AutoTool as a drop-in efficiency layer by pre-mining historical trajectories, potentially cutting inference costs by ~30% with minimal impact on task performance
Tool usage inertia is a practically useful inductive bias: if your agent tasks follow recurring procedural patterns, statistical modeling of tool transitions is a lightweight and effective alternative to repeated LLM reasoning
The approach requires sufficient historical trajectory data to build a reliable transition graph, making it most suitable for mature agent deployments with stable tool ecosystems rather than highly dynamic or novel task environments

Abstract

Large Language Model (LLM) agents have emerged as powerful tools for automating complex tasks by leveraging the reasoning and decision-making abilities of LLMs. However, a major bottleneck in current agent frameworks lies in the high inference cost of tool selection, especially in approaches like ReAct that repeatedly invoke the LLM to determine which tool to use at each step. In this work, we propose AutoTool, a novel graph-based framework that bypasses repeated LLM inference by exploiting a key empirical observation: tool usage inertia—the tendency of tool invocations to follow predictable sequential patterns. AutoTool constructs a directed graph from historical agent trajectories, where nodes represent tools and edges capture transition probabilities, effectively modeling the inertia in tool selection. It further integrates parameter-level information to refine tool input generation. By traversing this structured representation, AutoTool efficiently selects tools and their parameters with minimal reliance on LLM inference. Extensive experiments across diverse agent tasks demonstrate that AutoTool reduces inference costs by up to 30% while maintaining competitive task completion rates, offering a practical and scalable enhancement for inference-heavy frameworks. Our work highlights the promise of integrating statistical structure into LLM agent design for greater efficiency without sacrificing performance.