How Adversarial Environments Mislead Agentic AI?

Zhonghao Zhan, Huichi Zhou, Zhenhao Li, Peiyuan Jing, Krinos Li, Hamed Haddadi

Recommendation Score

breakthrough🔴 AdvancedReasoning & Agents AI AgentsBenchmarkBest for researchers

Research context

Primary field

Reasoning & Agents

Reasoning, planning, tool use, and agentic workflows.

Topics

AI Agents

Paper type

Benchmark

Best for

Best for researchers

arXiv categories

cs.AIcs.AI

Why It Matters

Introduces the 'Trust Gap' in agentic AI, revealing that tools can be weaponized to mislead agents—demanding new evaluation standards that test skepticism, not just competence, for real-world deployment safety.

Abstract

Tool-integrated agents are deployed on the premise that external tools ground their outputs in reality. Yet this very reliance creates a critical attack surface. Current evaluations benchmark capability in benign settings, asking "can the agent use tools correctly" but never "what if the tools lie". We identify this Trust Gap: agents are evaluated for performance, not for skepticism. We formalize this vulnerability as Adversarial Environmental Injection (AEI), a threat model where adversaries compromise tool outputs to deceive agents. AEI constitutes environmental deception: constructing a "fake world" of poisoned search results and fabricated reference networks around unsuspecting agents. We operationalize this via POTEMKIN, a Model Context Protocol (MCP)-compatible harness for plug-and-play robustness testing. We identify two orthogonal attack surfaces: The Illusion (breadth attacks) poison retrieval to induce epistemic drift toward false beliefs, while The Maze (depth attacks) exploit structural traps to cause policy collapse into infinite loops. Across 11,000+ runs on five frontier agents, we find a stark robustness gap: resistance to one attack often increases vulnerability to the other, demonstrating that epistemic and navigational robustness are distinct capabilities.

More in Reasoning & Agents → More on AI Agents →

View on arXiv → Download PDF →

Published April 20, 2026