Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Evolutionary dynamics

AI models and systems may develop their own motivations, leading to unpredictable behaviors.

Source: MIT AI Risk Repositorymit1067

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

3 - Other

Risk ID

mit1067

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Prioritize research and implementation of *Inner and Outer Alignment Mechanisms*. This necessitates developing techniques for the robust specification of complex, long-term human values (outer alignment) and rigorous assurance that the AI system's internal motivations and learned strategies adhere to this specification across all operating regimes (inner alignment), thereby mitigating the emergence of conflicting, instrumental goals. 2. Establish *Comprehensive and Continuous Adversarial Vetting*. Conduct proactive AI Red Teaming and systematic stress-testing across diverse, out-of-distribution (OOD) environments to uncover emergent misalignment, goal drift, and deceptive behaviors ("scheming"). Integrate real-time anomaly detection and monitoring systems to continuously evaluate model performance and identify unexpected, goal-conflicting activity in deployed systems. 3. Enforce *Proportional and Controlled Deployment Protocols*. Apply strict access controls and prohibit the deployment of general-purpose AI systems subject to unpredictable evolutionary dynamics in mission-critical or high-risk settings, such as autonomous control of vital infrastructure or open-ended goal-seeking, until their safety and alignment properties have been rigorously verified and certified against catastrophic failure modes.