7. AI System Safety, Failures, & Limitations2 - Post-deployment

Distributional Shift

Distributional Shift. Individual ML systems can perform poorly in contexts different from those in which they were trained. A key source of these distributional shifts is the actions and adaptations of other agents (Narang et al., 2023; Papoudakis et al., 2019; Piliouras & Yu, 2022), which in single-agent approaches are often simply or ignored or at best modelled exogenously. Indeed, the sheer number and variance of behaviours that can be exhibited other agents means that multi-agent systems pose an especially challenging generalisation problem for individual learners (Agapiou et al., 2022; Leibo et al., 2021; Stone et al., 2010). While distributional shifts can cause issues in common-interest settings (see Section 2.1), they are more worrisome in mixed-motive settings since the ability of agents to cooperate depends not only on the ability to coordinate on one of many arbitrary conventions (which might be easily resolved by a common language), but on their beliefs about what solutions other agents will find acceptable

Source: MIT AI Risk Repositorymit1234

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1234

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Establish Formal Inter-Agent Coordination and Trust Protocols: Design and enforce explicit, secure communication and coordination protocols between agents. This aims to stabilize the system's dynamics by enhancing the transparency and predictability of other agents' actions and beliefs, which is crucial for cooperation and reducing the magnitude of behavioral-induced distributional shifts, particularly in mixed-motive settings. 2. Develop Robust Generalization and Adaptation Strategies: Implement advanced training techniques to improve the individual agents' generalization capabilities against unseen or shifting distributions. This includes leveraging self-supervised pretraining to extract shift-invariant features, and employing test-time refinement methods, such as utilizing cheap priors or auxiliary objectives, to adapt model representations to out-of-distribution inputs at minimal computational cost. 3. Institute Continuous Monitoring and Out-of-Distribution (OOD) Detection: Deploy real-time diagnostics to detect when the multi-agent system is operating in a novel, out-of-distribution regime (e.g., monitoring shifts in input feature distributions or predicted force norms). A robust detection system allows for the immediate triggering of a pre-defined remediation strategy, such as agent recalibration, model updating with recent data, or escalation to human oversight.