Compatibility of AI vs. human value judgement
Compatibility of machine and human value judgment refers to the challenge whether human values can be globally implemented into learning AI systems without the risk of developing an own or even divergent value system to govern their behavior and possibly become harmful to humans.
ENTITY
3 - Other
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit327
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Advance Principled Value Alignment Research to establish formal, provably beneficial objective functions, such as those derived from Inverse Reinforcement Learning (IRL) under a framework of human deference, to ensure the machine's primary goal remains the uncertain maximization of all-encompassing human preferences. 2. Mandate and enforce the Integration of Human Oversight and Control (Human-in-the-Loop/Human-on-the-Loop) for all AI systems involved in critical or high-risk decision-making, ensuring that ultimate judgment and ethical responsibility reside with human operators who are trained to both trust and question AI outputs. 3. Develop and utilize Advanced AI Safety Engineering methods, including Explainable AI (XAI) for transparency, adversarial robustness testing, and continuous runtime behavioral monitoring, to detect and mitigate emergent, goal-drift, or deceptive behaviors that signal a divergence from intended human values.