Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks

This SoK paper presents the first comprehensive taxonomy and evaluation of Indirect Prompt Injection (IPI) defense frameworks for LLM agents, revealing fundamental flaws in existing defenses and introducing novel adaptive attacks that expose their vulnerabilities.

Problem Statement

LLM agents with function-calling capabilities are increasingly susceptible to IPI attacks that hijack tool calls, yet the landscape of defenses is fragmented and lacks unified evaluation criteria. Existing defenses have been developed in isolation without cross-framework comparison, making it impossible for practitioners to make informed security decisions. There is no systematic understanding of why defenses fail or how to build more robust alternatives.

Key Novelty

First comprehensive taxonomy of IPI-centric defense frameworks classifying them across five dimensions, providing a unified conceptual framework for the field
Systematic security and usability assessment of representative IPI defense frameworks, identifying six root causes of defense circumvention
Design of three novel adaptive attacks that significantly improve attack success rates against specific frameworks, demonstrating exploitable weaknesses

Evaluation Highlights

Three adaptive attacks designed from identified root causes significantly improve attack success rates over baseline attacks against targeted defense frameworks
Security-usability trade-off analysis across representative defense frameworks reveals that no existing framework achieves both high security and minimal usability degradation

Signal Assessment

6/10 This is a solid and timely SoK contribution that systematizes a fragmented but critical area of LLM agent security; while it does not introduce a new defense mechanism, its taxonomic rigor, root-cause analysis, and adaptive attack demonstrations provide valuable groundwork for the community.

Methodology

Survey and classify existing IPI-centric defense frameworks into a five-dimensional taxonomy covering detection, isolation, verification, filtering, and architectural approaches
Empirically evaluate representative defenses on both security (resistance to IPI attacks) and usability (task completion rate, latency, false positive rate) metrics
Analyze defensive failures to identify six root causes of circumvention, then engineer three novel adaptive attacks targeting specific framework weaknesses to validate findings

System Components

Five-Dimensional Taxonomy

Classifies IPI defenses along five axes (e.g., detection strategy, enforcement point, granularity) to unify fragmented literature

Security Assessment Module

Evaluates defense frameworks against IPI attack success rates using standardized attack benchmarks

Usability Assessment Module

Measures functional degradation introduced by defenses, including task success rate and false positive blocking

Root Cause Analysis

Identifies six categories of fundamental weaknesses in defense logic that allow attacker circumvention

Adaptive Attack Suite

Three novel attacks crafted to exploit identified root causes, used to stress-test and expose framework-specific vulnerabilities

Results

Aspect	Existing Defenses (Baseline)	After Adaptive Attacks	Finding
Attack Success Rate	Low (defenses appear effective)	Significantly higher	Defenses are brittle to adaptive adversaries
Security-Usability Balance	Typically optimized for one	No framework achieves both	Fundamental trade-off unresolved
Root Causes Identified	Not systematized	6 root causes documented	First structured failure analysis
Taxonomy Coverage	No unified framework	5-dimensional taxonomy	New organizational structure

Key Takeaways

Practitioners should not treat IPI defenses as security guarantees — existing frameworks fail under adaptive attacks targeting their specific design assumptions, requiring defense-in-depth strategies
When selecting or building IPI defenses, developers must explicitly evaluate the security-usability trade-off using the five-dimensional taxonomy to understand what threat models a defense actually covers
The six identified root causes of defense circumvention (e.g., reliance on syntactic patterns, incomplete context isolation) should serve as a design checklist for future defense frameworks targeting robustness against adaptive adversaries

Abstract

Large Language Model (LLM)-based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool calls. In response, numerous IPI-centric defense frameworks have emerged. However, these defenses are fragmented, lacking a unified taxonomy and comprehensive evaluation. In this Systematization of Knowledge (SoK), we present the first comprehensive analysis of IPI-centric defense frameworks. We introduce a comprehensive taxonomy of these defenses, classifying them along five dimensions. We then thoroughly assess the security and usability of representative defense frameworks. Through analysis of defensive failures in the assessment, we identify six root causes of defense circumvention. Based on these findings, we design three novel adaptive attacks that significantly improve attack success rates targeting specific frameworks, demonstrating the severity of the flaws in these defenses. Our paper provides a foundation and critical insights for the future development of more secure and usable IPI-centric agent defense frameworks.