Online-Optimized RAG for Tool Use and Function Calling
Problem Statement
RAG-based tool selection relies on embedding similarity between user queries and tool/function descriptions, but imperfect embedding models or noisy descriptions create misalignment that causes incorrect tool retrieval and downstream task failures. Existing solutions typically require offline retraining or fine-tuning, which is impractical in production environments with dynamic tool inventories. There is no established mechanism for RAG systems to self-correct and improve from live deployment feedback without significant infrastructure changes.
Key Novelty
- Deployment-time online learning framework that adapts retrieval embeddings continuously using only binary task success signals, requiring no LLM modifications or offline retraining
- Theoretical analysis providing problem-dependent performance bounds that quantify how initialization quality and other quantities affect convergence and retrieval accuracy
- Plug-and-play support for complex retrieval scenarios including single- and multi-hop tool use, dynamic tool inventories, and K-retrieval with re-ranking
Evaluation Highlights
- Consistent improvement in tool selection accuracy across diverse tool-use and document-retrieval benchmarks compared to static RAG baselines
- Negligible per-query latency overhead from lightweight online gradient updates, making the system practical for real-time deployment
Breakthrough Assessment
Methodology
- During deployment, queries are embedded and matched to tool/function descriptions using standard RAG retrieval; the selected tool is executed and a minimal feedback signal (e.g., binary task success) is collected
- Lightweight online gradient updates are applied to the retrieval embedding space (or a lightweight adapter on top of it) using the feedback signal to reduce misalignment between query embeddings and correct tool descriptions
- Updated embeddings are used for subsequent queries, enabling the system to self-improve over time while supporting dynamic tool inventories and multi-hop retrieval with re-ranking
System Components
A lightweight, trainable layer or adapter over the base embedding model that is updated online to correct misalignment between query and tool/function description embeddings
Performs minimal gradient steps using task-level feedback signals after each interaction, designed for negligible per-query computational overhead
Translates coarse deployment signals (e.g., task success/failure) into learning signals that guide embedding updates without requiring fine-grained supervision
Extends the framework to handle sequential tool calls across multiple retrieval hops and K-candidate re-ranking scenarios for complex agentic workflows
Accommodates tools being added or removed from the index at runtime without requiring full re-indexing or retraining
Results
| Metric/Benchmark | Static RAG Baseline | Online-Optimized RAG | Delta |
|---|---|---|---|
| Tool Selection Accuracy (diverse tool-use scenarios) | Lower (misalignment-affected) | Consistently higher | Positive improvement across all settings |
| End-Task Success Rate | Degraded under noisy descriptions | Improved via adaptive embeddings | Notable gains especially in high-misalignment regimes |
| Per-Query Latency Overhead | Baseline latency | Negligible additional cost | ~0 additional latency from online updates |
| Multi-hop Tool Use Accuracy | Baseline multi-hop RAG | Improved sequential retrieval | Positive, compounding gains across hops |
Key Takeaways
- Production RAG systems for tool use can self-improve at deployment time using only task success signals—no labeled data collection pipelines or offline retraining cycles are required, significantly reducing maintenance burden
- The plug-and-play design means practitioners can layer Online-Optimized RAG on top of existing LLM + RAG infrastructure without modifying the underlying model, making adoption low-risk and straightforward
- The theoretical performance bounds tied to initialization quality imply that starting with a reasonably good embedding model accelerates adaptation—investing in embedding model selection upfront pays dividends in faster online convergence
Abstract
In many applications, retrieval-augmented generation (RAG) drives tool use and function calling by embedding the (user) queries and matching them to pre-specified tool/function descriptions. In this paper, we address an embedding misalignment issue that often arises in practical applications due to imperfect embedding models or noisy descriptions; such misalignment may lead to incorrect retrieval and task failure. We introduce Online-Optimized RAG, a deployment-time framework that continually adapts retrieval embeddings from live interactions using minimal feedback (e.g., task success). Online-Optimized RAG applies lightweight online gradient updates with negligible per-query latency and requires no changes to the underlying LLM. The method is plug-and-play: it supports both single- and multi-hop tool use, dynamic tool inventories, and $K$-retrieval with re-ranking. We provide a problem-dependent theoretical analysis that quantifies how the method's performance depends on the initialization quality of the embeddings and other related quantities. Across diverse tool-use and document-retrieval scenarios, our Online-Optimized RAG consistently improves tool selection accuracy and end-task success, thus providing a simple, practical path to robust, self-improving RAG systems.