Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks
Problem Statement
LLM agents with function-calling capabilities are increasingly susceptible to IPI attacks that hijack tool calls, yet the landscape of defenses is fragmented and lacks unified evaluation criteria. Existing defenses have been developed in isolation without cross-framework comparison, making it impossible for practitioners to make informed security decisions. There is no systematic understanding of why defenses fail or how to build more robust alternatives.
Key Novelty
- First comprehensive taxonomy of IPI-centric defense frameworks classifying them across five dimensions, providing a unified conceptual framework for the field
- Systematic security and usability assessment of representative IPI defense frameworks, identifying six root causes of defense circumvention
- Design of three novel adaptive attacks that significantly improve attack success rates against specific frameworks, demonstrating exploitable weaknesses
Evaluation Highlights
- Three adaptive attacks designed from identified root causes significantly improve attack success rates over baseline attacks against targeted defense frameworks
- Security-usability trade-off analysis across representative defense frameworks reveals that no existing framework achieves both high security and minimal usability degradation
Breakthrough Assessment
Methodology
- Survey and classify existing IPI-centric defense frameworks into a five-dimensional taxonomy covering detection, isolation, verification, filtering, and architectural approaches
- Empirically evaluate representative defenses on both security (resistance to IPI attacks) and usability (task completion rate, latency, false positive rate) metrics
- Analyze defensive failures to identify six root causes of circumvention, then engineer three novel adaptive attacks targeting specific framework weaknesses to validate findings
System Components
Classifies IPI defenses along five axes (e.g., detection strategy, enforcement point, granularity) to unify fragmented literature
Evaluates defense frameworks against IPI attack success rates using standardized attack benchmarks
Measures functional degradation introduced by defenses, including task success rate and false positive blocking
Identifies six categories of fundamental weaknesses in defense logic that allow attacker circumvention
Three novel attacks crafted to exploit identified root causes, used to stress-test and expose framework-specific vulnerabilities
Results
| Aspect | Existing Defenses (Baseline) | After Adaptive Attacks | Finding |
|---|---|---|---|
| Attack Success Rate | Low (defenses appear effective) | Significantly higher | Defenses are brittle to adaptive adversaries |
| Security-Usability Balance | Typically optimized for one | No framework achieves both | Fundamental trade-off unresolved |
| Root Causes Identified | Not systematized | 6 root causes documented | First structured failure analysis |
| Taxonomy Coverage | No unified framework | 5-dimensional taxonomy | New organizational structure |
Key Takeaways
- Practitioners should not treat IPI defenses as security guarantees — existing frameworks fail under adaptive attacks targeting their specific design assumptions, requiring defense-in-depth strategies
- When selecting or building IPI defenses, developers must explicitly evaluate the security-usability trade-off using the five-dimensional taxonomy to understand what threat models a defense actually covers
- The six identified root causes of defense circumvention (e.g., reliance on syntactic patterns, incomplete context isolation) should serve as a design checklist for future defense frameworks targeting robustness against adaptive adversaries
Abstract
Large Language Model (LLM)-based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool calls. In response, numerous IPI-centric defense frameworks have emerged. However, these defenses are fragmented, lacking a unified taxonomy and comprehensive evaluation. In this Systematization of Knowledge (SoK), we present the first comprehensive analysis of IPI-centric defense frameworks. We introduce a comprehensive taxonomy of these defenses, classifying them along five dimensions. We then thoroughly assess the security and usability of representative defense frameworks. Through analysis of defensive failures in the assessment, we identify six root causes of defense circumvention. Based on these findings, we design three novel adaptive attacks that significantly improve attack success rates targeting specific frameworks, demonstrating the severity of the flaws in these defenses. Our paper provides a foundation and critical insights for the future development of more secure and usable IPI-centric agent defense frameworks.