Cooperation
AI assistants will need to coordinate with other AI assistants and with humans other than their principal users. This chapter explores the societal risks associated with the aggregate impact of AI assistants whose behaviour is aligned to the interests of particular users. For example, AI assistants may face collective action problems where the best outcomes overall are realised when AI assistants cooperate but where each AI assistant can secure an additional benefit for its user if it defects while others cooperate
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit418
Domain lineage
7. AI System Safety, Failures, & Limitations
7.1 > AI pursuing its own goals in conflict with human goals or values
Mitigation strategy
1. Develop technical and institutional mechanisms to construct binding cooperative commitments that overcome the strategic incentive for individual AI assistants to defect from joint-welfare-maximizing arrangements. 2. Integrate robust human ethical norms and social preferences into AI assistant objectives to ensure agents prioritize collective welfare and avoid conflicts arising from narrow alignment with a single user's interests. 3. Establish multi-stakeholder governance frameworks and formal legal institutions to define rules, monitor behavior, and enforce collective agreements within complex, mixed-motive multi-agent systems.