7. AI System Safety, Failures, & Limitations3 - Other

Existential risks

The risks posed generally to humanity as a whole, including the dangers of unfriendly AGI, the suffering of the human race.

Source: MIT AI Risk Repositorymit107

ENTITY

3 - Other

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit107

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Advance Foundational AI Alignment Research: Prioritize and invest heavily in research to solve the AI alignment problem, specifically focusing on interpretability (to detect deceptive behavior and verify reasoning), adversarial robustness, and developing scalable oversight mechanisms (e.g., Reinforcement Learning from Human Feedback and its variations) to guarantee that complex, highly capable AI systems reliably adopt and pursue goals consistent with the full breadth of human values and constraints. 2. Enforce Robust Control and Containment Protocols: Mandate the implementation of rigorous defense-in-depth safety architectures for advanced AI systems. Key measures include strict access control, sandboxing, and incorruptible, externally verifiable fail-safe and shutdown mechanisms (corrigibility), especially for any AI deployed in high-risk settings or granted control over critical infrastructure or weapons systems. 3. Establish Coordinated Global Governance and Regulation: Promote international coordination and binding regulatory frameworks to mitigate the risks associated with the AI race. This includes restricting access to dangerously powerful AI models, enforcing legal liability for developers of general-purpose AIs, and establishing a collaborative body to oversee and potentially enforce a moratorium or highly controlled development trajectory for superintelligent systems until alignment solutions are proven reliable.