7. AI System Safety, Failures, & Limitations2 - Post-deployment

Cooperation

AI assistants will need to coordinate with other AI assistants and with humans other than their principal users. This chapter explores the societal risks associated with the aggregate impact of AI assistants whose behaviour is aligned to the interests of particular users. For example, AI assistants may face collective action problems where the best outcomes overall are realised when AI assistants cooperate but where each AI assistant can secure an additional benefit for its user if it defects while others cooperate

Source: MIT AI Risk Repositorymit418

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit418

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Develop technical and institutional mechanisms to construct binding cooperative commitments that overcome the strategic incentive for individual AI assistants to defect from joint-welfare-maximizing arrangements. 2. Integrate robust human ethical norms and social preferences into AI assistant objectives to ensure agents prioritize collective welfare and avoid conflicts arising from narrow alignment with a single user's interests. 3. Establish multi-stakeholder governance frameworks and formal legal institutions to define rules, monitor behavior, and enforce collective agreements within complex, mixed-motive multi-agent systems.