← Back to Papers

Practical Poisoning Attacks against Retrieval-Augmented Generation

Baolei Zhang, Yuxi Chen, Minghong Fang, Zhuqing Liu, Lihai Nie, Tong Li, Zheli Liu
arXiv.org | 2025
CorruptRAG is a practical poisoning attack against Retrieval-Augmented Generation systems that achieves high attack success rates by injecting only a single poisoned text into the knowledge database, making it both feasible and stealthy.

Problem Statement

RAG systems are increasingly deployed to mitigate LLM hallucination and knowledge staleness, but they introduce a new attack surface through their external knowledge databases. Existing poisoning attacks assume attackers can inject multiple poisoned documents per query to outnumber legitimate retrievals, which is unrealistic in real-world deployments. This over-optimistic assumption renders prior threat models impractical and leaves a gap in understanding true RAG vulnerabilities.

Key Novelty

  • Single-document poisoning constraint: CorruptRAG demonstrates effective RAG compromise with only one injected poisoned text per query, dramatically lowering the bar for real-world attackers
  • Practical threat model: Redefines the poisoning attack scenario to be realistic about attacker capabilities, shifting focus from quantity-based to quality-based adversarial text crafting
  • Improved attack success rate: Outperforms existing multi-document baselines despite using far fewer poisoned entries, suggesting superior adversarial text construction strategies

Evaluation Highlights

  • CorruptRAG achieves higher attack success rates than existing multi-poisoned-text baselines across multiple large-scale datasets using only a single injected document
  • Experiments conducted on multiple large-scale datasets validate generalizability of the attack across diverse retrieval and QA settings

Breakthrough Assessment

6/10 CorruptRAG makes a solid and practically important contribution by exposing a realistic and previously underexplored threat model for RAG systems, though it is an incremental security advance rather than a fundamental paradigm shift in either RAG or adversarial ML.

Methodology

  1. Step 1 - Threat Model Formulation: Define a constrained attacker who can inject exactly one poisoned text into the RAG knowledge database per target query, reflecting realistic deployment constraints
  2. Step 2 - Adversarial Text Construction: Craft a single poisoned document that is both semantically similar enough to be retrieved by the target query and contains misleading content designed to steer LLM generation toward attacker-desired outputs
  3. Step 3 - Evaluation: Measure attack success rate (ASR) across multiple large-scale QA datasets by comparing the LLM's output when poisoned text is retrieved against ground-truth answers and baseline attack methods

System Components

CorruptRAG Poisoned Text Generator

Generates a single adversarially crafted document that maximizes retrieval likelihood for the target query while embedding misleading information to manipulate LLM output

Constrained Threat Model

Formalizes the realistic assumption that an attacker injects only one poisoned text per query, replacing the unrealistic majority-poisoning assumption of prior work

RAG Pipeline Attack Interface

Operates on the knowledge database layer, requiring no access to the LLM weights or retriever internals—only write access to the document store

Results

Metric/Benchmark Baseline (multi-doc poisoning) CorruptRAG (single-doc) Delta
Attack Success Rate (ASR) Lower (requires multiple docs) Higher with single doc Positive improvement
Number of injected documents Multiple per query 1 per query Significant reduction in attacker footprint
Generalization across datasets Tested on limited benchmarks Multiple large-scale datasets Broader validation

Key Takeaways

  • RAG deployments should not assume safety from poisoning attacks based on limiting write access volume alone—a single malicious document can be sufficient to compromise system outputs
  • Security evaluations of RAG systems must adopt realistic attacker threat models (single or few injections) rather than majority-poisoning assumptions, which overestimate attacker resources
  • Practitioners building RAG pipelines should implement retrieval-time anomaly detection, document provenance tracking, and adversarial robustness checks to defend against single-document poisoning vectors like CorruptRAG

Abstract

Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge. Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues. While RAG enhances LLM outputs, it remains vulnerable to poisoning attacks. Recent studies show that injecting poisoned text into the knowledge database can compromise RAG systems, but most existing attacks assume that the attacker can insert a sufficient number of poisoned texts per query to outnumber correct-answer texts in retrieval, an assumption that is often unrealistic. To address this limitation, we propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text, enhancing both feasibility and stealth. Extensive experiments conducted on multiple large-scale datasets demonstrate that CorruptRAG achieves higher attack success rates than existing baselines.

Generated on 2026-03-03 using Claude