Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Unpredictable outcomes

Our culture, lifestyle, and even probability of survival may change drastically. Because the intentions programmed into an artificial agent cannot be guaranteed to lead to a positive outcome, Machine Ethics becomes a topic that may not produce guaranteed results, and Safety Engineering may correspondingly degrade our ability to utilize the technology fully.

Source: MIT AI Risk Repositorymit117

ENTITY

3 - Other

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit117

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Implement rigorous and scalable **Motivational Alignment** research to ensure the target objectives (Outer Alignment) and the internal optimization processes (Inner Alignment) of nascent AGI systems are verifiably consistent with the full breadth of human values and intentions, thereby mitigating the risk of divergent and catastrophic goal-seeking behavior. 2. Develop and integrate **Capability Control Mechanisms**, such as provably reliable Interruptibility, Containment (sandboxing), and Dynamic Capability Caps, into all advanced AI architectures to ensure that human operators can maintain ultimate oversight and safely de-escalate the system in the event of emergent misalignment or power-seeking behaviors. 3. Establish a comprehensive, globally coordinated **AI Governance and Safety Framework** that mandates pre-deployment risk assessments, requires demonstrable compliance with safety standards, and enforces independent, real-time monitoring and oversight to proactively mitigate existential threats across all AGI development efforts.