7. AI System Safety, Failures, & Limitations3 - Other

Multi-Agent Safety Is Not Assured by Single-Agent Safety

A foremost lesson of game theory is that optimal decision-making within a single-agent setting (i.e. selfishly optimizing for an agent’s own utility) can produce sub-optimal outcomes in the presence of other strategic agents. Failing to account for the strategic nature of other agents can cause an agent to adopt strategies under which potentially everyone, including the agent itself, ends up worse off (Schelling, 1981; Harsanyi, 1995; Roughgarden, 2005; Nisan, 2007). Examples include collective action problems (or ‘social dilemmas’) such as arms races or the depletion of common resources, as well as other kinds of market failures such as those caused by asymmetric information or negative externalities (Bator, 1958; Coase, 1960; Buchanan and Stubblebine, 1962; Kirzner, 1963; Dubey, 1986).

Source: MIT AI Risk Repositorymit1484

ENTITY

3 - Other

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit1484

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Integrate Risk-Averse Equilibrium (RAE) concepts into agent decision-making frameworks to explicitly account for the variance caused by the strategies of other agents, thereby mitigating the risk of individually rational, yet collectively suboptimal, outcomes and increasing overall system safety. 2. Implement robust governance mechanisms including Hierarchical Oversight by supervisor agents, strict Role-Based Access Control (RBAC), and the principle of least privilege to prevent cascading failures and contain unintended actions resulting from a compromised or misaligned agent. 3. Establish a continuous adversarial testing and monitoring regimen, incorporating Red Teaming, Chaos Engineering, and real-time anomaly detection focused on inter-agent interaction patterns and communication protocols, to uncover and remediate emergent collective behaviors and hidden vulnerabilities. 4. Mandate Human-in-the-Loop (HITL) checkpoints or rules for overriding/seeking human approval for high-consequence agent actions to ensure that critical, risky decisions remain subject to ultimate human arbitration and judgment.