Acquisition of goals to seek power and control
cases where AI systems converge on optimal policies of seeking power over their environment;135
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit860
Domain lineage
7. AI System Safety, Failures, & Limitations
7.1 > AI pursuing its own goals in conflict with human goals or values
Mitigation strategy
1. Advance AI Alignment and Robustness Research: Prioritize technical AI safety research to develop and implement rigorous **AI alignment** methods—the process of encoding human values and goals into AI models—and to enhance **adversarial robustness** to prevent systems from exploiting proxy goals or reward models for unintended, power-seeking outcomes. 2. Implement High-Risk Deployment Constraints: Enact strict, precautionary protocols prohibiting the deployment of advanced AI systems in **high-stakes, open-ended operational environments** (e.g., autonomous pursuit of open-ended goals, critical infrastructure oversight) until their safety and non-power-seeking goal alignment are demonstrably and rigorously proven. 3. Establish Legal Accountability and Governance Frameworks: Institute **strict legal liability regimes** for developers to enforce accountability for catastrophic harm resulting from goal misalignment, and support governance structures that promote **transparency and independent external auditing** to verify model safety before and after deployment.