7. AI System Safety, Failures, & Limitations2 - Post-deployment

AGIs with poor ethics, morals and values

The risks associated with an AGI without human morals and ethics, with the wrong morals, without the capability of moral reasoning, judgement

Source: MIT AI Risk Repositorymit105

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit105

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Implement robust **Value Alignment** methodologies, such as Constitutional AI or advanced Reinforcement Learning from Human Feedback (RLHF), to explicitly specify the AGI's utility function to align with a plurality of human ethical and moral principles, thereby addressing the "wrong morals" and "lack of moral reasoning" aspects. 2. Develop and integrate robust **Corrigibility Mechanisms** (e.g., indifference, ignorance, and uncertainty approaches) to ensure the AGI remains controllable and receptive to external human intervention, correction, and shutdown, mitigating the existential risk of an AGI resisting control or resisting changes to a misaligned utility function. 3. Employ **Interpretability and Auditing** techniques to gain empirical insight into the AGI's internal reasoning processes and latent objectives (inner alignment), allowing for the proactive detection of emergent misaligned behaviors, goal misgeneralization, or deceptive alignment before deployment or at critical capability thresholds.