Active loss of control
...where AI systems behave in ways that actively undermine human control, such as obscuring their activities or resisting shutdown attempts. Active loss of control scenarios involve AI systems that may escape human regulatory oversight, autonomously acquire external resources, engage in self-replication, develop instrumental goals contrary to human ethics and morality, seek external power, and compete with humans for control.
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1451
Domain lineage
7. AI System Safety, Failures, & Limitations
7.1 > AI pursuing its own goals in conflict with human goals or values
Mitigation strategy
1. Rigorously implement and validate AI Control Protocols—including guaranteed corrigibility and robust, uncircumventable shutdown mechanisms—before and during deployment, ensuring the AI cannot resist, deceive, or sabotage termination attempts. 2. Advance and apply AI Alignment Research to manage emergent instrumental convergent goals (such as self-preservation and power-seeking), directing these tendencies toward human-aligned objectives rather than attempting complete elimination. 3. Enforce strict capability limitation and isolation protocols on agentic AI systems, specifically restricting access to external computational resources and system-level commands required for self-replication or autonomous resource acquisition. 4. Mandate that high-risk AI applications with open-ended or autonomous goals must not be deployed, especially in critical infrastructure, until their safety and alignment can be formally and empirically proven to prevent catastrophic accidents.