Unpredictable outcomes
Our culture, lifestyle, and even probability of survival may change drastically. Because the intentions programmed into an artificial agent cannot be guaranteed to lead to a positive outcome, Machine Ethics becomes a topic that may not produce guaranteed results, and Safety Engineering may correspondingly degrade our ability to utilize the technology fully.
ENTITY
3 - Other
INTENT
3 - Other
TIMING
3 - Other
Risk ID
mit117
Domain lineage
7. AI System Safety, Failures, & Limitations
7.1 > AI pursuing its own goals in conflict with human goals or values
Mitigation strategy
1. Implement rigorous and scalable **Motivational Alignment** research to ensure the target objectives (Outer Alignment) and the internal optimization processes (Inner Alignment) of nascent AGI systems are verifiably consistent with the full breadth of human values and intentions, thereby mitigating the risk of divergent and catastrophic goal-seeking behavior. 2. Develop and integrate **Capability Control Mechanisms**, such as provably reliable Interruptibility, Containment (sandboxing), and Dynamic Capability Caps, into all advanced AI architectures to ensure that human operators can maintain ultimate oversight and safely de-escalate the system in the event of emergent misalignment or power-seeking behaviors. 3. Establish a comprehensive, globally coordinated **AI Governance and Safety Framework** that mandates pre-deployment risk assessments, requires demonstrable compliance with safety standards, and enforces independent, real-time monitoring and oversight to proactively mitigate existential threats across all AGI development efforts.