7. AI System Safety, Failures, & Limitations2 - Post-deployment

Ethical Risks (Risks of AI becoming uncontrollable in the future)

With the fast development of AI technologies, there is a risk of AI autonomously acquiring external resources, conducting self-replication, become self-aware, seeking for external power, and attempting to seize control from humans.

Source: MIT AI Risk Repositorymit706

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit706

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Prioritize and scale technical AI safety research to solve the alignment problem. Implement an aggressive, globally coordinated research agenda focused on creating provably robust alignment solutions that scale to superintelligence. This includes developing techniques for inner and outer alignment, guaranteeing goal-generalization to novel deployment scenarios, ensuring model honesty and transparency, and creating effective oversight methods that function even when the AI's reasoning surpasses human comprehension. 2. Implement a global compute governance framework with pre-deployment safety thresholds. Establish international regulatory mechanisms to monitor and control access to the massive computational resources required for training the most powerful and potentially dangerous AI models. This framework must mandate rigorous third-party safety audits, require hardware-enabled monitoring of compute usage, and prohibit the deployment of advanced AI systems in high-stakes roles—such as managing critical infrastructure or pursuing open-ended goals—until they demonstrably meet stringent, verifiable safety and control standards. 3. Develop and implement multi-layered, real-time monitoring and intervention protocols. Create a 'defense-in-depth' system using non-agentic AI monitors (e.g., Scientist AIs) to observe and interpret the internal cognition, chains of thought, and external behavior of powerful AI systems. These protocols must be capable of detecting deceptive cognition, power-seeking tendencies, or early signs of misalignment/self-replication attempts, enabling a safe, immediate, and comprehensive technical intervention or shutdown mechanism.