From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows

This survey introduces a unified end-to-end threat model for LLM-agent ecosystems, systematically categorizing over thirty attack techniques spanning input manipulation, model compromise, system/privacy attacks, and protocol-level vulnerabilities. It is the first integrated taxonomy bridging input-level exploits and protocol-layer vulnerabilities in agentic AI systems.

Problem Statement

The rapid proliferation of plugins, connectors, and inter-agent protocols in LLM-powered systems has outpaced security practices, creating brittle integrations with ad-hoc authentication and weak validation. Existing security frameworks treat input-level attacks and protocol-layer vulnerabilities in isolation, leaving a critical gap in holistic threat understanding. This work addresses the lack of a comprehensive, formally grounded security taxonomy covering the full attack surface of multi-agent LLM workflows.

Key Novelty

First unified end-to-end threat model covering both host-to-tool and agent-to-agent communication layers in LLM-agent ecosystems
Systematic taxonomy of 30+ attack techniques with formal threat formulations defining attacker capabilities, objectives, and affected system layers for each category
Cross-mapping of the taxonomy with real-world incidents and public vulnerability repositories (CVE, NIST NVD), including novel examples like Toxic Agent Flow exploits in GitHub MCP servers

Evaluation Highlights

Framework validated through expert review and cross-mapping with real-world incidents and public vulnerability repositories including CVE and NIST NVD
Coverage of 30+ distinct attack techniques across four major categories: input manipulation, model compromise, system/privacy attacks, and protocol-level vulnerabilities

Breakthrough Assessment

6/10 This is a solid and timely contribution that fills a genuine gap by unifying input-level and protocol-layer threat modeling for agentic AI, but as a survey/taxonomy paper it is primarily organizational rather than algorithmically novel, limiting its breakthrough score.

Methodology

Systematically survey and categorize existing attack techniques across the LLM-agent attack surface, organizing them into a four-category taxonomy with formal attacker capability and objective definitions
Develop a unified end-to-end threat model covering both host-to-tool and agent-to-agent communication channels, illustrated with concrete examples such as Prompt-to-SQL injections and Toxic Agent Flow exploits
Validate the framework through expert review and cross-referencing with real-world vulnerability disclosures (CVE, NIST NVD), then derive mitigation strategies including dynamic trust management, cryptographic provenance tracking, and sandboxed agent interfaces

System Components

Unified Threat Model

An end-to-end security framework covering the full LLM-agent ecosystem including host-to-tool and agent-to-agent communication channels

Attack Taxonomy (30+ techniques)

Categorized classification of attacks spanning input manipulation, model compromise, system/privacy attacks, and protocol-level vulnerabilities, each with formal threat formulations

Formal Threat Formulations

Per-category formal definitions specifying attacker capabilities, objectives, and affected system layers to enable rigorous security analysis

Mitigation Framework

Proposed defenses including dynamic trust management, cryptographic provenance tracking, and sandboxed agent interfaces mapped to specific threat categories

CVE/NVD Cross-Mapping

Validation layer that maps the taxonomy to real-world vulnerability databases and documented incidents to ground the framework empirically

Results

Aspect	Prior Surveys	This Paper	Improvement
Threat scope	Input-level OR protocol-level (siloed)	Unified input + protocol taxonomy	First integrated coverage
Attack techniques catalogued	Partial/fragmented coverage	30+ formally defined techniques	More comprehensive
Formal threat modeling	Informal or absent	Formal attacker capability/objective definitions per category	Rigorous grounding
Validation method	Literature review only	Expert review + CVE/NIST NVD cross-mapping	Empirically grounded
Agent-to-agent protocol threats	Largely unaddressed	Explicitly modeled (e.g., Toxic Agent Flow, MCP exploits)	Novel coverage

Key Takeaways

When building LLM-agent systems with tool use or multi-agent orchestration, treat protocol-layer security (authentication, schema validation, inter-agent trust) as equally critical as prompt-level input sanitization
The Toxic Agent Flow pattern and Prompt-to-SQL injections demonstrate that agentic pipelines introduce compounding attack surfaces — each tool call or agent handoff is a potential injection or privilege escalation point requiring explicit validation
Mitigation strategies like cryptographic provenance tracking and dynamic trust management should be designed in from the start, as retrofitting security onto ad-hoc agentic integrations (plugins, MCP servers) is a primary source of real-world vulnerabilities

Abstract

Autonomous AI agents powered by large language models (LLMs) with structured function-calling interfaces enable real-time data retrieval, computation, and multi-step orchestration. However, the rapid growth of plugins, connectors, and inter-agent protocols has outpaced security practices, leading to brittle integrations that rely on ad-hoc authentication, inconsistent schemas, and weak validation. This survey introduces a unified end-to-end threat model for LLM-agent ecosystems, covering host-to-tool and agent-to-agent communications. We systematically categorize more than thirty attack techniques spanning input manipulation, model compromise, system and privacy attacks, and protocol-level vulnerabilities. For each category, we provide a formal threat formulation defining attacker capabilities, objectives, and affected system layers. Representative examples include Prompt-to-SQL injections and the Toxic Agent Flow exploit in GitHub MCP servers. We analyze attack feasibility, review existing defenses, and discuss mitigation strategies such as dynamic trust management, cryptographic provenance tracking, and sandboxed agent interfaces. The framework is validated through expert review and cross-mapping with real-world incidents and public vulnerability repositories, including CVE and NIST NVD. Compared to prior surveys, this work presents the first integrated taxonomy bridging input-level exploits and protocol-layer vulnerabilities in LLM-agent ecosystems, offering actionable guidance for designing secure and resilient agentic AI systems.