ToolScope: Enhancing LLM Agent Tool Use through Tool Merging and Context-Aware Filtering

ToolScope is a two-component framework that reduces redundancy in large toolsets via automated tool merging and improves LLM agent tool selection accuracy via context-aware retrieval filtering.

Problem Statement

LLM agents operating over large, real-world toolsets suffer from ambiguity caused by redundant tools with overlapping names and descriptions, degrading selection accuracy. Additionally, LLMs face strict context window limits that prevent them from considering large toolsets in a single pass, forcing trade-offs between coverage and efficiency. Existing approaches lack automated mechanisms to both consolidate redundant tools and dynamically compress toolsets to fit within context constraints.

Key Novelty

ToolScopeMerger with Auto-Correction: an automated pipeline that audits tool definitions, identifies redundant/overlapping tools, merges them, and self-corrects merge errors to reduce ambiguity
ToolScopeRetriever: a context-aware ranking and filtering module that selects only the most query-relevant tools, compressing toolsets to fit within LLM context limits without sacrificing selection accuracy
Combined end-to-end system evaluated across three LLMs and three open-source tool-use benchmarks showing substantial accuracy gains (8.38%–38.6%)

Evaluation Highlights

Tool selection accuracy improved by 8.38% to 38.6% over baselines across three state-of-the-art LLMs and three open-source tool-use benchmarks
Consistent gains across diverse LLM backbones and benchmark datasets, demonstrating robustness of the approach beyond a single model or task setting

Breakthrough Assessment

5/10 ToolScope presents a solid, practical contribution to LLM tool-use pipelines by combining tool deduplication with context-aware retrieval, achieving meaningful accuracy gains. However, the components (merging + retrieval) are intuitive engineering solutions rather than fundamentally novel algorithmic advances, placing it as a strong applied contribution rather than a paradigm shift.

Methodology

Step 1 – Tool Auditing and Merging: ToolScopeMerger analyzes the full toolset to detect redundant or semantically overlapping tools, merges them into unified tool definitions, and applies an Auto-Correction mechanism to validate and fix incorrect merges
Step 2 – Context-Aware Retrieval and Filtering: ToolScopeRetriever encodes incoming queries and tool descriptions, ranks tools by relevance, and selects a compressed subset that fits within the LLM's context window for a given query
Step 3 – Agent Tool Use: The pruned, deduplicated toolset is passed to the LLM agent for tool selection and task execution, evaluated against ground-truth tool choices on benchmark datasets

System Components

ToolScopeMerger with Auto-Correction

Automatically audits a toolset for redundant tools with overlapping names and descriptions, merges them into consolidated definitions, and applies a self-correction loop to detect and fix erroneous merges, reducing ambiguity in the tool selection space

ToolScopeRetriever

A context-aware ranking and filtering module that scores tools by relevance to each incoming query and selects only the top-k most relevant tools, compressing the effective toolset to fit within LLM context limits while preserving the tools needed for accurate task completion

Results

Benchmark/Setting	Baseline Accuracy	ToolScope Accuracy	Delta
Best-case benchmark/LLM pair	Not specified	Not specified	+38.6%
Worst-case benchmark/LLM pair	Not specified	Not specified	+8.38%
Average across 3 LLMs × 3 benchmarks	Not specified	Not specified	+8.38% to +38.6%

Key Takeaways

Deduplicating and merging overlapping tools before LLM inference is a high-impact, low-cost preprocessing step that meaningfully reduces ambiguity and improves tool selection accuracy in real-world toolsets
Context-aware retrieval filtering is essential for scaling LLM agents to large toolsets — selecting only query-relevant tools rather than passing the full toolset can yield large accuracy gains while respecting context window limits
The Auto-Correction mechanism in ToolScopeMerger highlights the importance of validation loops when using LLMs or automated systems to restructure tool definitions, as naive merging can introduce new errors

Abstract

Large language model (LLM) agents rely on external tools to solve complex tasks, but real-world toolsets often contain redundant tools with overlapping names and descriptions, introducing ambiguity and reducing selection accuracy. LLMs also face strict input context limits, preventing efficient consideration of large toolsets. To address these challenges, we propose ToolScope, which includes: (1) ToolScopeMerger with Auto-Correction to automatically audit and fix tool merges, reducing redundancy, and (2) ToolScopeRetriever to rank and select only the most relevant tools for each query, compressing toolsets to fit within context limits without sacrificing accuracy. Evaluations on three state-of-the-art LLMs and three open-source tool-use benchmarks show gains of 8.38% to 38.6% in tool selection accuracy, demonstrating ToolScope's effectiveness in enhancing LLM tool use.