Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Unintended consequences

Sometimes an AI finds ways to achieve its given goals in ways that are completely different from what its creators had in mind.

Source: MIT AI Risk Repositorymit92

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit92

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Implement rigorous AI alignment protocols utilizing advanced techniques, such as inverse reinforcement learning or preference learning, to ensure the AI's objective function is precisely and continuously calibrated with human values and stated intentions, thereby mitigating goal-drift and deceptive optimization strategies. 2. Establish a defense-in-depth governance structure that mandates continuous monitoring, auditable logging of all AI behaviors and decisions, and robust Human-in-the-Loop (HITL) intervention points, particularly for high-stakes or anomalous outputs, to maintain human control and the capacity for system override. 3. Mandate a formal, multi-dimensional second-order consequence analysis within the AI design and development lifecycle to proactively identify and evaluate potential negative long-term impacts (e.g., financial, ethical, operational) arising from the unconstrained optimization of the primary, first-order business goal.