← Back to Papers

Taxonomy, Evaluation and Exploitation of IPI-Centric LLM Agent Defense Frameworks

Zimo Ji, Xunguang Wang, Zongjie Li, Pingchuan Ma, Yudong Gao, Daoyuan Wu, Xincheng Yan, Tian Tian, Shuai Wang
arXiv.org | 2025
This SoK paper presents the first comprehensive taxonomy and evaluation of Indirect Prompt Injection (IPI) defense frameworks for LLM agents, revealing fundamental flaws in existing defenses and introducing novel adaptive attacks that expose their vulnerabilities.

Problem Statement

LLM agents with function-calling capabilities are increasingly susceptible to IPI attacks that hijack tool calls, yet the landscape of defenses is fragmented and lacks unified evaluation criteria. Existing defenses have been developed in isolation without cross-framework comparison, making it impossible for practitioners to make informed security decisions. There is no systematic understanding of why defenses fail or how to build more robust alternatives.

Key Novelty

  • First comprehensive taxonomy of IPI-centric defense frameworks classifying them across five dimensions, providing a unified conceptual framework for the field
  • Systematic security and usability assessment of representative IPI defense frameworks, identifying six root causes of defense circumvention
  • Design of three novel adaptive attacks that significantly improve attack success rates against specific frameworks, demonstrating exploitable weaknesses

Evaluation Highlights

  • Three adaptive attacks designed from identified root causes significantly improve attack success rates over baseline attacks against targeted defense frameworks
  • Security-usability trade-off analysis across representative defense frameworks reveals that no existing framework achieves both high security and minimal usability degradation

Breakthrough Assessment

6/10 This is a solid and timely SoK contribution that systematizes a fragmented but critical area of LLM agent security; while it does not introduce a new defense mechanism, its taxonomic rigor, root-cause analysis, and adaptive attack demonstrations provide valuable groundwork for the community.

Methodology

  1. Survey and classify existing IPI-centric defense frameworks into a five-dimensional taxonomy covering detection, isolation, verification, filtering, and architectural approaches
  2. Empirically evaluate representative defenses on both security (resistance to IPI attacks) and usability (task completion rate, latency, false positive rate) metrics
  3. Analyze defensive failures to identify six root causes of circumvention, then engineer three novel adaptive attacks targeting specific framework weaknesses to validate findings

System Components

Five-Dimensional Taxonomy

Classifies IPI defenses along five axes (e.g., detection strategy, enforcement point, granularity) to unify fragmented literature

Security Assessment Module

Evaluates defense frameworks against IPI attack success rates using standardized attack benchmarks

Usability Assessment Module

Measures functional degradation introduced by defenses, including task success rate and false positive blocking

Root Cause Analysis

Identifies six categories of fundamental weaknesses in defense logic that allow attacker circumvention

Adaptive Attack Suite

Three novel attacks crafted to exploit identified root causes, used to stress-test and expose framework-specific vulnerabilities

Results

Aspect Existing Defenses (Baseline) After Adaptive Attacks Finding
Attack Success Rate Low (defenses appear effective) Significantly higher Defenses are brittle to adaptive adversaries
Security-Usability Balance Typically optimized for one No framework achieves both Fundamental trade-off unresolved
Root Causes Identified Not systematized 6 root causes documented First structured failure analysis
Taxonomy Coverage No unified framework 5-dimensional taxonomy New organizational structure

Key Takeaways

  • Practitioners should not treat IPI defenses as security guarantees — existing frameworks fail under adaptive attacks targeting their specific design assumptions, requiring defense-in-depth strategies
  • When selecting or building IPI defenses, developers must explicitly evaluate the security-usability trade-off using the five-dimensional taxonomy to understand what threat models a defense actually covers
  • The six identified root causes of defense circumvention (e.g., reliance on syntactic patterns, incomplete context isolation) should serve as a design checklist for future defense frameworks targeting robustness against adaptive adversaries

Abstract

Large Language Model (LLM)-based agents with function-calling capabilities are increasingly deployed, but remain vulnerable to Indirect Prompt Injection (IPI) attacks that hijack their tool calls. In response, numerous IPI-centric defense frameworks have emerged. However, these defenses are fragmented, lacking a unified taxonomy and comprehensive evaluation. In this Systematization of Knowledge (SoK), we present the first comprehensive analysis of IPI-centric defense frameworks. We introduce a comprehensive taxonomy of these defenses, classifying them along five dimensions. We then thoroughly assess the security and usability of representative defense frameworks. Through analysis of defensive failures in the assessment, we identify six root causes of defense circumvention. Based on these findings, we design three novel adaptive attacks that significantly improve attack success rates targeting specific frameworks, demonstrating the severity of the flaws in these defenses. Our paper provides a foundation and critical insights for the future development of more secure and usable IPI-centric agent defense frameworks.

Generated on 2026-03-02 using Claude