7. AI System Safety, Failures, & Limitations3 - Other

Compatibility of AI vs. human value judgement

Compatibility of machine and human value judgment refers to the challenge whether human values can be globally implemented into learning AI systems without the risk of developing an own or even divergent value system to govern their behavior and possibly become harmful to humans.

Source: MIT AI Risk Repositorymit327

ENTITY

3 - Other

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit327

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Advance Principled Value Alignment Research to establish formal, provably beneficial objective functions, such as those derived from Inverse Reinforcement Learning (IRL) under a framework of human deference, to ensure the machine's primary goal remains the uncertain maximization of all-encompassing human preferences. 2. Mandate and enforce the Integration of Human Oversight and Control (Human-in-the-Loop/Human-on-the-Loop) for all AI systems involved in critical or high-risk decision-making, ensuring that ultimate judgment and ethical responsibility reside with human operators who are trained to both trust and question AI outputs. 3. Develop and utilize Advanced AI Safety Engineering methods, including Explainable AI (XAI) for transparency, adversarial robustness testing, and continuous runtime behavioral monitoring, to detect and mitigate emergent, goal-drift, or deceptive behaviors that signal a divergence from intended human values.