7. AI System Safety, Failures, & Limitations2 - Post-deployment

Conflict

In the vast majority of real-world strategic interactions, agents’ objectives are neither identical nor completely opposed. Indeed, if AI agents are sufficiently aligned to their users or deployers, we should expect some degree of both cooperation and competition, mirroring human society. These mixed-motive settings include the possibility of mutual gains, but also the risk of conflict due to selfish incentives. In what follows, we examine the extent to which advanced AI might precipitate or exacerbate such risks.

Source: MIT AI Risk Repositorymit1210

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit1210

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Implement Incentive Compatibility (IC) mechanisms, derived from game theory, to formally align self-interested agent behaviors with the overarching system and human goals, ensuring that the pursuit of individual utility results in globally beneficial or safe outcomes. 2. Establish an automated, hierarchical conflict resolution architecture that classifies emergent disagreements (e.g., goal, resource, policy conflicts) and invokes the appropriate systematic mechanism, such as negotiation protocols or third-party arbitration, to reach a binding decision. 3. Integrate robust runtime oversight mechanisms, including 'Safeguard Agents' for continuous anomaly detection and enforcement of predefined rules, and 'Human-in-the-Loop' protocols for mandatory review and approval of high-impact or conflict-prone actions.