← Back to Papers

Online-Optimized RAG for Tool Use and Function Calling

Yu Pan, Xiaochen Li, Hanzhao Wang
arXiv.org | 2025
Online-Optimized RAG is a deployment-time framework that continuously adapts retrieval embeddings during live interactions using minimal feedback signals (e.g., task success/failure) to correct embedding misalignment in tool use and function calling pipelines. It requires no LLM modifications and applies lightweight online gradient updates with negligible latency overhead.

Problem Statement

RAG-based tool selection relies on embedding similarity between user queries and tool/function descriptions, but imperfect embedding models or noisy descriptions create misalignment that causes incorrect tool retrieval and downstream task failures. Existing solutions typically require offline retraining or fine-tuning, which is impractical in production environments with dynamic tool inventories. There is no established mechanism for RAG systems to self-correct and improve from live deployment feedback without significant infrastructure changes.

Key Novelty

  • Deployment-time online learning framework that adapts retrieval embeddings continuously using only binary task success signals, requiring no LLM modifications or offline retraining
  • Theoretical analysis providing problem-dependent performance bounds that quantify how initialization quality and other quantities affect convergence and retrieval accuracy
  • Plug-and-play support for complex retrieval scenarios including single- and multi-hop tool use, dynamic tool inventories, and K-retrieval with re-ranking

Evaluation Highlights

  • Consistent improvement in tool selection accuracy across diverse tool-use and document-retrieval benchmarks compared to static RAG baselines
  • Negligible per-query latency overhead from lightweight online gradient updates, making the system practical for real-time deployment

Breakthrough Assessment

6/10 The paper addresses a real and underexplored practical gap—runtime adaptation of RAG embeddings without retraining—with a theoretically grounded and plug-and-play solution. However, the core idea of online gradient-based adaptation is incremental relative to existing online learning literature, and the scope is narrowly focused on embedding alignment rather than a broader architectural innovation.

Methodology

  1. During deployment, queries are embedded and matched to tool/function descriptions using standard RAG retrieval; the selected tool is executed and a minimal feedback signal (e.g., binary task success) is collected
  2. Lightweight online gradient updates are applied to the retrieval embedding space (or a lightweight adapter on top of it) using the feedback signal to reduce misalignment between query embeddings and correct tool descriptions
  3. Updated embeddings are used for subsequent queries, enabling the system to self-improve over time while supporting dynamic tool inventories and multi-hop retrieval with re-ranking

System Components

Embedding Alignment Module

A lightweight, trainable layer or adapter over the base embedding model that is updated online to correct misalignment between query and tool/function description embeddings

Online Gradient Update Engine

Performs minimal gradient steps using task-level feedback signals after each interaction, designed for negligible per-query computational overhead

Feedback Signal Processor

Translates coarse deployment signals (e.g., task success/failure) into learning signals that guide embedding updates without requiring fine-grained supervision

Multi-hop & Re-ranking Support

Extends the framework to handle sequential tool calls across multiple retrieval hops and K-candidate re-ranking scenarios for complex agentic workflows

Dynamic Tool Inventory Handler

Accommodates tools being added or removed from the index at runtime without requiring full re-indexing or retraining

Results

Metric/Benchmark Static RAG Baseline Online-Optimized RAG Delta
Tool Selection Accuracy (diverse tool-use scenarios) Lower (misalignment-affected) Consistently higher Positive improvement across all settings
End-Task Success Rate Degraded under noisy descriptions Improved via adaptive embeddings Notable gains especially in high-misalignment regimes
Per-Query Latency Overhead Baseline latency Negligible additional cost ~0 additional latency from online updates
Multi-hop Tool Use Accuracy Baseline multi-hop RAG Improved sequential retrieval Positive, compounding gains across hops

Key Takeaways

  • Production RAG systems for tool use can self-improve at deployment time using only task success signals—no labeled data collection pipelines or offline retraining cycles are required, significantly reducing maintenance burden
  • The plug-and-play design means practitioners can layer Online-Optimized RAG on top of existing LLM + RAG infrastructure without modifying the underlying model, making adoption low-risk and straightforward
  • The theoretical performance bounds tied to initialization quality imply that starting with a reasonably good embedding model accelerates adaptation—investing in embedding model selection upfront pays dividends in faster online convergence

Abstract

In many applications, retrieval-augmented generation (RAG) drives tool use and function calling by embedding the (user) queries and matching them to pre-specified tool/function descriptions. In this paper, we address an embedding misalignment issue that often arises in practical applications due to imperfect embedding models or noisy descriptions; such misalignment may lead to incorrect retrieval and task failure. We introduce Online-Optimized RAG, a deployment-time framework that continually adapts retrieval embeddings from live interactions using minimal feedback (e.g., task success). Online-Optimized RAG applies lightweight online gradient updates with negligible per-query latency and requires no changes to the underlying LLM. The method is plug-and-play: it supports both single- and multi-hop tool use, dynamic tool inventories, and $K$-retrieval with re-ranking. We provide a problem-dependent theoretical analysis that quantifies how the method's performance depends on the initialization quality of the embeddings and other related quantities. Across diverse tool-use and document-retrieval scenarios, our Online-Optimized RAG consistently improves tool selection accuracy and end-task success, thus providing a simple, practical path to robust, self-improving RAG systems.

Generated on 2026-02-21 using Claude