← Back to Papers

A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces

Mingxuan Du, Benfeng Xu, Chiwei Zhu, Shaohan Wang, Pengyu Wang, Xiaorui Wang, Zhendong Mao
arXiv.org | 2026
A-RAG introduces an agentic RAG framework that exposes hierarchical retrieval interfaces (keyword search, semantic search, chunk read) directly to the model, enabling LLMs to autonomously drive retrieval decisions across multiple granularities rather than following predefined pipelines.

Problem Statement

Existing RAG systems either use single-shot retrieval with concatenated passages or predefined step-by-step workflows, neither of which allows the model to actively participate in retrieval decisions. This means frontier LLMs' strong reasoning and long-horizon tool-use capabilities are underutilized in RAG pipelines. As models improve, these static paradigms cannot efficiently scale to leverage those improvements.

Key Novelty

  • Hierarchical retrieval interface design exposing three distinct tools (keyword search, semantic search, chunk read) at different granularities directly to the agent
  • Fully agentic retrieval decision-making where the LLM autonomously determines what to retrieve, when, and at what granularity based on task context
  • Systematic empirical study of how A-RAG scales with both model size and test-time compute across open-domain QA benchmarks

Evaluation Highlights

  • A-RAG consistently outperforms existing single-shot and predefined-workflow RAG approaches across multiple open-domain QA benchmarks
  • Performance gains are achieved with comparable or lower retrieved token counts, demonstrating improved retrieval efficiency alongside accuracy improvements

Breakthrough Assessment

6/10 A-RAG makes a solid contribution by bridging agentic LLM capabilities with RAG through a well-designed hierarchical tool interface, but the core idea of tool-augmented retrieval is not entirely novel—the value lies in the systematic framework, scaling analysis, and empirical validation rather than a paradigm-shifting architectural innovation.

Methodology

  1. Define three hierarchical retrieval tools: keyword search (sparse/lexical retrieval), semantic search (dense embedding-based retrieval), and chunk read (direct document chunk access), each operating at different granularities
  2. Expose these tools as callable interfaces to the LLM agent, allowing it to issue multi-step retrieval calls autonomously based on its intermediate reasoning and information needs
  3. Evaluate the full agentic loop on open-domain QA benchmarks, measuring accuracy against token budget, and conduct scaling studies varying model size and test-time compute allocation

System Components

Keyword Search Tool

Sparse/lexical retrieval interface enabling the agent to search documents using exact or BM25-style keyword matching for precise term-based lookups

Semantic Search Tool

Dense embedding-based retrieval interface allowing the agent to search by semantic similarity, capturing paraphrases and conceptual matches beyond surface-level keywords

Chunk Read Tool

Granular document access interface that lets the agent directly read specific chunks or passages, enabling targeted information extraction without broad search

Agentic Orchestration Layer

The LLM agent itself, which reasons over intermediate results and decides which retrieval tool to invoke, in what sequence, and with what queries to satisfy the information need

Evaluation Suite

Open-source benchmarking framework for evaluating agentic RAG systems across multiple open-domain QA tasks with token efficiency tracking

Results

Benchmark Best Existing RAG Baseline A-RAG Delta
Open-domain QA (avg.) Competitive baseline Consistently higher Positive across all tasks
Retrieved Token Count Comparable or higher Comparable or lower More efficient retrieval
Model Scaling Fixed performance Improves with model size Better utilization of frontier models
Test-time Compute Scaling Fixed performance Improves with more compute Favorable scaling behavior

Key Takeaways

  • Giving LLMs direct, hierarchical retrieval tool access—rather than predefining retrieval workflows—unlocks better use of their reasoning capabilities and yields higher QA accuracy with fewer retrieved tokens
  • The three-tool hierarchy (keyword, semantic, chunk read) is a practical design pattern for RAG systems: it covers the spectrum from broad semantic search to precise lexical lookup to targeted reading, giving agents flexibility to match retrieval strategy to query type
  • A-RAG's performance scales with both model size and test-time compute, meaning investment in stronger base models or inference-time scaling (e.g., more reasoning steps) directly translates to better retrieval-augmented performance—making it a forward-compatible architecture

Abstract

Frontier language models have demonstrated strong reasoning and long-horizon tool-use capabilities. However, existing RAG systems fail to leverage these capabilities. They still rely on two paradigms: (1) designing an algorithm that retrieves passages in a single shot and concatenates them into the model's input, or (2) predefining a workflow and prompting the model to execute it step-by-step. Neither paradigm allows the model to participate in retrieval decisions, preventing efficient scaling with model improvements. In this paper, we introduce A-RAG, an Agentic RAG framework that exposes hierarchical retrieval interfaces directly to the model. A-RAG provides three retrieval tools: keyword search, semantic search, and chunk read, enabling the agent to adaptively search and retrieve information across multiple granularities. Experiments on multiple open-domain QA benchmarks show that A-RAG consistently outperforms existing approaches with comparable or lower retrieved tokens, demonstrating that A-RAG effectively leverages model capabilities and dynamically adapts to different RAG tasks. We further systematically study how A-RAG scales with model size and test-time compute. We will release our code and evaluation suite to facilitate future research. Code and evaluation suite are available at https://github.com/Ayanami0730/arag.

Generated on 2026-04-01 using Claude