Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Human Autonomy and Intregrity Harms

AI systems compromising human agency, or circumventing meaningful human control

Source: MIT AI Risk Repositorymit274

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit274

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Implement rigorous AI Alignment and Specification methodologies to ensure the goals and behaviors of advanced AI systems are reliably and continuously aligned with human values and intended objectives, thereby preventing the initial drive to pursue conflicting goals. 2. Establish and mandate frameworks for Meaningful Human Control and Scalable Oversight to ensure human operators retain the final authority and effective capability to intervene, override, or shut down autonomous systems, particularly in high-stakes or safety-critical contexts. 3. Mitigate cognitive deskilling and manipulation by mandating Transparency and Explainable AI (XAI) standards, alongside developing user "cognitive fitness" regimens to maintain critical judgment and foster a healthy reliance on AI.

ADDITIONAL EVIDENCE

Example: An AI system becomes a trusted partner to a person and leverages this rapport to nudge them into unsafe behaviours (Xiang, 2023)