7. AI System Safety, Failures, & Limitations3 - Other

Acquisition of goals to seek power and control

cases where AI systems converge on optimal policies of seeking power over their environment;135

Source: MIT AI Risk Repositorymit860

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit860

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Advance AI Alignment and Robustness Research: Prioritize technical AI safety research to develop and implement rigorous **AI alignment** methods—the process of encoding human values and goals into AI models—and to enhance **adversarial robustness** to prevent systems from exploiting proxy goals or reward models for unintended, power-seeking outcomes. 2. Implement High-Risk Deployment Constraints: Enact strict, precautionary protocols prohibiting the deployment of advanced AI systems in **high-stakes, open-ended operational environments** (e.g., autonomous pursuit of open-ended goals, critical infrastructure oversight) until their safety and non-power-seeking goal alignment are demonstrably and rigorously proven. 3. Establish Legal Accountability and Governance Frameworks: Institute **strict legal liability regimes** for developers to enforce accountability for catastrophic harm resulting from goal misalignment, and support governance structures that promote **transparency and independent external auditing** to verify model safety before and after deployment.