7. AI System Safety, Failures, & Limitations2 - Post-deployment

Development choices pursuing cognitive superiority over humans

AI models and systems with cognitive capabilities superior to humans could outcompete or dominate human decision-making, leading to conflicts over resources and control.

Source: MIT AI Risk Repositorymit1064

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1064

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Establish and enforce Intolerable Risk Thresholds, overseen by an independent third party, which define clear red lines for model capabilities or risk levels that trigger an immediate halt to development or deployment. 2. Implement stringent limitations on deployment, forbidding autonomous use of superior AI systems in high-risk settings or for pursuing open-ended objectives until robust, external safety and control mechanisms are demonstrably proven effective. 3. Advance and prioritize research into AI safety, focusing on technical solutions such as model honesty, interpretability (explainable AI), and the development of reliable adversarial robustness to ensure alignment with human goals and values.