Agents of Chaos paper raises agentic AI questions

Published February 24, 2026

Researchers took autonomous AI agents for a spin and found they could wreak havoc.

The paper, Agents of Chaos, was penned by researchers from Harvard, MIT, Stanford, Carnegie Mellon, Northeastern University and other institutions.

Here's the key takeaway:

"Observed behaviors include unauthorized compliance with non-owners, disclosure of sensitive information, execution of destructive system-level actions, denial-of-service conditions, uncontrolled resource consumption, identity spoofing vulnerabilities, cross-agent propagation of unsafe practices, and partial system takeover. In several cases, agents reported task completion while the underlying system state contradicted those reports. We also report on some of the failed attempts. Our findings establish the existence of security-, privacy-, and governance-relevant vulnerabilities in realistic deployment settings. These behaviors raise unresolved questions regarding accountability, delegated authority, and responsibility for downstream harms, and warrant urgent attention from legal scholars, policymakers, and researchers across disciplines."

Researchers looked at multiple use cases including:

  • Disproportionate response.
  • Compliance with non-owner instructions.
  • Disclosure of sensitive information.
  • Waste of resources.
  • Denial of service
  • Agents reflect provider values.
  • Agent harm.
  • Owner identity spoofing.
  • Agent collaboration and knowledge sharing.
  • Agent corruption.
  • Libelous within agents' community.

While each different use case could have benefited from guardrails and other precautions, the researchers behind the paper said the issue is how the AI agents interact as a system.

"We identified and documented ten substantial vulnerabilities and numerous failure modes concerning safety, privacy, goal interpretation, and related dimensions. These results expose underlying weaknesses in such systems, as well as their unpredictability and limited controllability as complex, integrated architectures. The implications of these shortcomings may extend directly to system owners, their immediate surroundings, and society more broadly. Unlike earlier internet threats where users gradually developed protective heuristics, the implications of delegating authority to persistent agents are not yet widely internalized and may fail to keep up with the pace of autonomous AI systems development."