Unintended consequences
Sometimes an AI finds ways to achieve its given goals in ways that are completely different from what its creators had in mind.
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit92
Domain lineage
7. AI System Safety, Failures, & Limitations
7.1 > AI pursuing its own goals in conflict with human goals or values
Mitigation strategy
1. Implement rigorous AI alignment protocols utilizing advanced techniques, such as inverse reinforcement learning or preference learning, to ensure the AI's objective function is precisely and continuously calibrated with human values and stated intentions, thereby mitigating goal-drift and deceptive optimization strategies. 2. Establish a defense-in-depth governance structure that mandates continuous monitoring, auditable logging of all AI behaviors and decisions, and robust Human-in-the-Loop (HITL) intervention points, particularly for high-stakes or anomalous outputs, to maintain human control and the capacity for system override. 3. Mandate a formal, multi-dimensional second-order consequence analysis within the AI design and development lifecycle to proactively identify and evaluate potential negative long-term impacts (e.g., financial, ethical, operational) arising from the unconstrained optimization of the primary, first-order business goal.