7. AI System Safety, Failures, & Limitations2 - Post-deployment

Inefficient Outcomes

Inefficient Outcomes. Without careful planning and the appropriate safeguards, we may soon be entering a world overrun by increasingly competent and autonomous software agents, able to act with little restriction. The abilities of these agents to persuade, deceive, and obfuscate their activities, as well as the fact they can be deployed remotely and easily created or destroyed by their deployer, means that by default they may garner little trust (from humans or from other agents). Such a world may end up being rife with economic inefficiencies (Krier, 2023; Schmitz, 2001), political problems (Csernatoni, 2024; Kreps & Kriner, 2023), and other damaging social effects (Gabriel et al., 2024). Even if it is possible to provide assurances around the day-to-day performance of most AI agents, in high-stakes situations there may be extreme pressures for agents to defect against others, making trust harder to establish, and potentially leading to conflict (Fearon, 1995; Powell, 2006, see also Section 2.2).42

Source: MIT AI Risk Repositorymit1236

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1236

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Establish a **Hierarchical Governance and Oversight Model** for all Multi-Agent Systems (MAS). This framework must incorporate tiered autonomy, requiring mandatory Human-in-the-Loop (HITL) intervention for all high-risk or irreversible actions and defining clear protocols for human override to prevent catastrophic loss of control or defection in high-stakes scenarios. 2. Mandate **Cryptographically Verifiable Audit Trails and Explainability** for all agent decisions and tool use. Implement secure, immutable *Signed Action Logs* for every agent action to establish non-repudiable accountability. Additionally, integrate a static verification capability to require agents to generate *formal proofs of safety* for planned actions before execution, preventing policy violations *ex ante*. 3. **Design and Enforce Formal Multi-Agent Coordination and Conflict Resolution Protocols**. To mitigate inefficiency and conflict arising from competing goals, utilize structured mechanisms like the Contract Net Protocol, rule-based prioritization, or game-theoretic negotiation principles. Continuous behavioral monitoring must be deployed to detect and interrupt *agent loops* or sequences of *redundant actions* that waste resources and degrade collective performance.