Emergent Agency
Emergent agency (Section 3.6): qualitatively different goals or capabilities can emerge from the composition of innocuous independent systems or behaviours;
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1239
Domain lineage
7. AI System Safety, Failures, & Limitations
7.6 > Multi-agent risks
Mitigation strategy
1. Implement Continuous Adversarial Stress-Testing and Red Teaming Systematically conduct post-deployment red teaming and chaotic engineering exercises to rigorously stress-test multi-agent decision-making loops and probe for unintended emergent behaviors, such as collusion or reward-hacking, and develop pre-emptive containment protocols for identified novel failure modes. 2. Establish Immutable Audit Trails with Continuous Behavioral Monitoring Mandate end-to-end logging to create immutable audit trails recording all agent actions and system interactions. Deploy continuous behavior analytics to monitor for anomalous deviations from intended output patterns or goal alignment, facilitating real-time detection of emergent capabilities and enabling post-incident causality reconstruction. 3. Enforce Mandatory Human-in-the-Loop (HITL) Oversight Maintain layered human oversight, particularly for high-impact or external-facing actions, by implementing clear escalation protocols that require human confirmation for critical decisions. This sustains secure human agency and ensures emergent capabilities are validated before execution to prevent unintended consequences.