7. AI System Safety, Failures, & Limitations2 - Post-deployment

Error propagation

Error Propagation. One well-known issue with communication networks is that information can be corrupted as it propagates through the network.24 As AI systems become capable of generating and processing more and more kinds of information, AI agents could end up ‘polluting the epistemic commons’ (Huang & Siddarth, 2023; Kay et al., 2024) of both other agents (Ju et al., 2024) and humans (see Case Study 7 and Section 3.1) Another increasingly important framework is the use of individual AI agents as part of teams and scaffolded chains of delegation, which transmit not only information but instructions or goals through networks of agents. If these goals are distorted or corrupted, then this can lead to worse outcomes for the delegating agent(s) (Nguyen et al., 2024b; Sourbut et al., 2024). Finally, while the previous examples are phrased in terms of unintentional errors, it may be that certain network structures allow – or perhaps even encourage – the spread of errors that are deliberately introduced by malicious agents (Gu et al., 2024; Ju et al., 2024; Lee & Tiwari, 2024, see also Case Study 8).

Source: MIT AI Risk Repositorymit1222

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1222

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Implement Zero-Trust Inter-Agent Communication and Verification Protocols: Mandate a zero-trust architecture for internal agent messaging, treating all inter-agent communication as untrusted input that requires validation and sanitization. This must be coupled with explicit verification loops, wherein agents are required to cross-check claims against independent, grounded data sources (e.g., telemetry or knowledge graphs) to actively refute or reprove suspect conclusions, thereby preventing the recursive amplification of errors (Source 8, 18). 2. Employ Continuous Adversarial Testing and System-Level Robustness Engineering: Systematically apply advanced error-mitigation techniques and implement robust fault-tolerant designs, such as functional redundancy. This must be complemented by rigorous adversarial exercises, including continuous AI Red Teaming and Chaos Engineering, to proactively stress-test multi-agent decision-making loops and prompt chains, identifying vulnerabilities to both unintentional error and malicious distortion propagation prior to and post-deployment (Source 4, 16, 19). 3. Establish a Multi-Agent Governance Framework with Principled Human Oversight: Institute a formal governance framework that defines clear agent roles, responsibilities, and access controls (RBAC) to limit potential blast radii. Critically, high-impact decisions or external-facing actions must be subject to a "Human-in-the-Loop" safety mechanism with predefined escalation protocols, ensuring human accountability and judgment validates critical outputs before they can propagate systemically (Source 16, 18).