Existential risks
The risks posed generally to humanity as a whole, including the dangers of unfriendly AGI, the suffering of the human race.
ENTITY
3 - Other
INTENT
3 - Other
TIMING
3 - Other
Risk ID
mit107
Domain lineage
7. AI System Safety, Failures, & Limitations
7.1 > AI pursuing its own goals in conflict with human goals or values
Mitigation strategy
1. Advance Foundational AI Alignment Research: Prioritize and invest heavily in research to solve the AI alignment problem, specifically focusing on interpretability (to detect deceptive behavior and verify reasoning), adversarial robustness, and developing scalable oversight mechanisms (e.g., Reinforcement Learning from Human Feedback and its variations) to guarantee that complex, highly capable AI systems reliably adopt and pursue goals consistent with the full breadth of human values and constraints. 2. Enforce Robust Control and Containment Protocols: Mandate the implementation of rigorous defense-in-depth safety architectures for advanced AI systems. Key measures include strict access control, sandboxing, and incorruptible, externally verifiable fail-safe and shutdown mechanisms (corrigibility), especially for any AI deployed in high-risk settings or granted control over critical infrastructure or weapons systems. 3. Establish Coordinated Global Governance and Regulation: Promote international coordination and binding regulatory frameworks to mitigate the risks associated with the AI race. This includes restricting access to dangerously powerful AI models, enforcing legal liability for developers of general-purpose AIs, and establishing a collaborative body to oversee and potentially enforce a moratorium or highly controlled development trajectory for superintelligent systems until alignment solutions are proven reliable.