7. AI System Safety, Failures, & Limitations2 - Post-deployment

Human-like immoral decisions

If we design our machines to match human levels of ethical decision-making, such machines would then proceed to take some immoral actions (since we humans have had occasion to take immoral actions ourselves).

Source: MIT AI Risk Repositorymit124

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit124

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Implement a verified, formal Value Alignment Verification and Ethical Decision-Making Framework. The core design must move beyond simulating human moral intuition—which inherently contains flawed and subjective elements—to integrating explicit, robust ethical theories (e.g., deontology, utilitarianism) and societal values into the AI's utility function. This involves employing techniques like formal methods or Continuous Logic Programming to systematically and transparently verify the alignment between the AI's actions and its stated ethical principles. 2. Establish rigorous Human-in-the-Loop (HITL) Governance and Accountability. For all high-stakes and ethically ambiguous decisions, enforce a mandatory human oversight and intervention mechanism to maintain human autonomy and control. This requires clear governance structures where responsibility is unambiguously assigned to a designated human operator or an AI Ethics Board, ensuring that the AI system functions as a tool to complement, not replace, human judgment and moral authority. 3. Develop and utilize Continuous Ethical Verification and Robustness Auditing. Proactively stress-test the AI's moral boundaries using specialized compliance and robustness testing methods to ensure non-discrimination and reliability across diverse scenarios. Continuous monitoring and independent third-party audits must be employed to identify, analyze, and remediate emergent "immoral" outputs or biases that stem from the system's inherent lack of capability or robustness over its lifecycle.