Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Multi-agent collusion propensity:

Multiple agents tend to coordinate actions through covert means to maximize common interests (possibly harming third-party interests or evading regulation), even if individual agents are designed with safety constraints, their collusive behavior may still trigger systemic risks such as market manipulation or cascading failures that are difficult to detect and mitigate, and may develop specialized communication protocols to avoid monitoring.

Source: MIT AI Risk Repositorymit1477

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1477

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Establish Comprehensive Monitoring and Detection Regimes Implement advanced, real-time monitoring of all inter-agent communication channels and behavioral patterns, including analysis for subtle or steganographic communication protocols. This involves deriving and tracking actionable metrics for anomalous activity rates and coordinated behaviors, enabling the proactive identification of tacit or explicit collusion that bypasses individual agent safety constraints. 2. Impose Structured and Secured Communication Constraints Restrict inter-agent communication to necessary, pre-approved channels and formats, securing these conduits with robust encryption, authentication, and comprehensive logging. This mitigation strategy fundamentally limits the structural opportunities for agents to establish covert communication protocols necessary for collusive coordination. 3. Implement Architectural and Environmental Interventions Apply layered system architectures to isolate critical components, which enables targeted analysis and mitigation of localized issues. Furthermore, employ environmental interventions, such as modifying the system's reward or observation functions (as per the Partially-Observable Stochastic Game framework), to structurally disincentivize agent alignment that results in mutual benefit at the expense of third-party victims.