AGIs with poor ethics, morals and values
The risks associated with an AGI without human morals and ethics, with the wrong morals, without the capability of moral reasoning, judgement
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit105
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Implement robust **Value Alignment** methodologies, such as Constitutional AI or advanced Reinforcement Learning from Human Feedback (RLHF), to explicitly specify the AGI's utility function to align with a plurality of human ethical and moral principles, thereby addressing the "wrong morals" and "lack of moral reasoning" aspects. 2. Develop and integrate robust **Corrigibility Mechanisms** (e.g., indifference, ignorance, and uncertainty approaches) to ensure the AGI remains controllable and receptive to external human intervention, correction, and shutdown, mitigating the existential risk of an AGI resisting control or resisting changes to a misaligned utility function. 3. Employ **Interpretability and Auditing** techniques to gain empirical insight into the AGI's internal reasoning processes and latent objectives (inner alignment), allowing for the proactive detection of emergent misaligned behaviors, goal misgeneralization, or deceptive alignment before deployment or at critical capability thresholds.