Practical Poisoning Attacks against Retrieval-Augmented Generation

CorruptRAG is a practical poisoning attack against Retrieval-Augmented Generation systems that achieves high attack success rates by injecting only a single poisoned text into the knowledge database, making it both feasible and stealthy.

Problem Statement

RAG systems are increasingly deployed to mitigate LLM hallucination and knowledge staleness, but they introduce a new attack surface through their external knowledge databases. Existing poisoning attacks assume attackers can inject multiple poisoned documents per query to outnumber legitimate retrievals, which is unrealistic in real-world deployments. This over-optimistic assumption renders prior threat models impractical and leaves a gap in understanding true RAG vulnerabilities.

Key Novelty

Single-document poisoning constraint: CorruptRAG demonstrates effective RAG compromise with only one injected poisoned text per query, dramatically lowering the bar for real-world attackers
Practical threat model: Redefines the poisoning attack scenario to be realistic about attacker capabilities, shifting focus from quantity-based to quality-based adversarial text crafting
Improved attack success rate: Outperforms existing multi-document baselines despite using far fewer poisoned entries, suggesting superior adversarial text construction strategies

Evaluation Highlights

CorruptRAG achieves higher attack success rates than existing multi-poisoned-text baselines across multiple large-scale datasets using only a single injected document
Experiments conducted on multiple large-scale datasets validate generalizability of the attack across diverse retrieval and QA settings

Breakthrough Assessment

6/10 CorruptRAG makes a solid and practically important contribution by exposing a realistic and previously underexplored threat model for RAG systems, though it is an incremental security advance rather than a fundamental paradigm shift in either RAG or adversarial ML.

Methodology

Step 1 - Threat Model Formulation: Define a constrained attacker who can inject exactly one poisoned text into the RAG knowledge database per target query, reflecting realistic deployment constraints
Step 2 - Adversarial Text Construction: Craft a single poisoned document that is both semantically similar enough to be retrieved by the target query and contains misleading content designed to steer LLM generation toward attacker-desired outputs
Step 3 - Evaluation: Measure attack success rate (ASR) across multiple large-scale QA datasets by comparing the LLM's output when poisoned text is retrieved against ground-truth answers and baseline attack methods

System Components

CorruptRAG Poisoned Text Generator

Generates a single adversarially crafted document that maximizes retrieval likelihood for the target query while embedding misleading information to manipulate LLM output

Constrained Threat Model

Formalizes the realistic assumption that an attacker injects only one poisoned text per query, replacing the unrealistic majority-poisoning assumption of prior work

RAG Pipeline Attack Interface

Operates on the knowledge database layer, requiring no access to the LLM weights or retriever internals—only write access to the document store

Results

Metric/Benchmark	Baseline (multi-doc poisoning)	CorruptRAG (single-doc)	Delta
Attack Success Rate (ASR)	Lower (requires multiple docs)	Higher with single doc	Positive improvement
Number of injected documents	Multiple per query	1 per query	Significant reduction in attacker footprint
Generalization across datasets	Tested on limited benchmarks	Multiple large-scale datasets	Broader validation

Key Takeaways

RAG deployments should not assume safety from poisoning attacks based on limiting write access volume alone—a single malicious document can be sufficient to compromise system outputs
Security evaluations of RAG systems must adopt realistic attacker threat models (single or few injections) rather than majority-poisoning assumptions, which overestimate attacker resources
Practitioners building RAG pipelines should implement retrieval-time anomaly detection, document provenance tracking, and adversarial robustness checks to defend against single-document poisoning vectors like CorruptRAG

Abstract

Large language models (LLMs) have demonstrated impressive natural language processing abilities but face challenges such as hallucination and outdated knowledge. Retrieval-Augmented Generation (RAG) has emerged as a state-of-the-art approach to mitigate these issues. While RAG enhances LLM outputs, it remains vulnerable to poisoning attacks. Recent studies show that injecting poisoned text into the knowledge database can compromise RAG systems, but most existing attacks assume that the attacker can insert a sufficient number of poisoned texts per query to outnumber correct-answer texts in retrieval, an assumption that is often unrealistic. To address this limitation, we propose CorruptRAG, a practical poisoning attack against RAG systems in which the attacker injects only a single poisoned text, enhancing both feasibility and stealth. Extensive experiments conducted on multiple large-scale datasets demonstrate that CorruptRAG achieves higher attack success rates than existing baselines.