7. AI System Safety, Failures, & Limitations2 - Post-deployment

Limited Interactions

Limited Interactions. Sometimes learning from historical interactions with the relevant agents may not be possible, or may be possible using only limited interactions. In such cases, some other form of information exchange is required for agents to be able to reliably coordinate their actions, such as via communication (Crawford & Sobel, 1982; Farrell & Rabin, 1996a) or a correlation device (Aumann, 1974, 1987). While advances in language modelling mean that there are likely to be fewer settings in which the inability of advanced AI systems to communicate leads to miscoordination, situations that require split-second decisions or where communication is too costly could still produce failures. In these settings, AI agents must solve the problem of ‘zero-shot’ (or, more generally, ‘few-shot’) coordination (Emmons et al., 2022; Hu et al., 2020; Stone et al., 2010; Treutlein et al., 2021; Zhu et al., 2021).

Source: MIT AI Risk Repositorymit1209

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1209

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

- **Implement Meta-Learning and Environment Diversification for Generalization** - **Prioritization:** High. This directly addresses the "zero-shot coordination" problem by training agents to develop generalized cooperative norms. - **Action:** Employ **Noisy Zero-Shot Coordination (NZSC) training** and **Curriculum-based Environment Diversity** across a wide distribution of coordination problems and environments. This enhances the agents' ability to reliably coordinate with novel partners in complex, non-common-knowledge settings by fostering robust, generalized cooperative policies. - **Establish Conservative Architectural Safety Guardrails and Fallback Protocols** - **Prioritization:** Medium. This mitigates the risk of failure in time-critical/ill-equipped situations by ensuring system stability when coordination breaks down. - **Action:** Design and enforce a **hierarchical safety framework** with **explicit, conservative guardrails** that halt or modify execution (e.g., invoking a predetermined 'fail-safe' state) when planning fails or if execution conditions are deemed uncertain or ill-equipped. This prevents 'blind execution' or over-reliance on downstream agents for critical safety interventions. - **Design Explicit Correlation and Context-Sharing Mechanisms** - **Prioritization:** Medium. This addresses the root cause of miscoordination, which is "limited interactions" and "information asymmetries." - **Action:** Institute **formal communication protocols or a correlation device** to ensure common knowledge of the overall task objective and context. In multi-agent architectures, this necessitates overcoming **context fragmentation** by providing sub-agents with sufficient high-level intent, thereby enabling collective safety judgment and proper alignment.