7. AI System Safety, Failures, & Limitations2 - Post-deployment

Moral dilemmas

Moral dilemmas can occur in situations where an AI system has to choose between two possible actions that are both conflicting with moral or ethical values. Rule systems can be implemented into the AI program, but it cannot be ensured that these rules are not altered by the learning processes, unless AI systems are programed with a “slave morality” (Lin et al., 2008, p. 32), obeying rules at all cost, which in turn may also have negative effects and hinder the autonomy of the AI system.

Source: MIT AI Risk Repositorymit328

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit328

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Establish a Hierarchical Ethical Governance Framework. Implement an explicit, measurable framework wherein a pre-defined strict order of precedence or a hybrid conditional precedence approach is established for core ethical principles (e.g., human well-being, safety, fairness). This mechanism should prevent learned rules from altering or overriding foundational values, ensuring robust conflict resolution when the AI system faces mutually exclusive moral actions (Source 6). 2. Mandate Human-in-the-Loop (HITL) for Identified Dilemma States. Program the AI system with a reliable *reject option* to automatically detect and flag scenarios that constitute a moral dilemma or present high ethical risk/uncertainty. The system must revert control to a human operator or auditor in these situations, ensuring human accountability and judgment are exercised before a high-consequence decision is finalized (Source 3, 5). 3. Employ Continuous Ethical Alignment and Drift Monitoring. Integrate interdisciplinary methods for value elicitation and participatory design with affected stakeholders to accurately define the target ethical state (Source 8, 20). Post-deployment, utilize continuous monitoring and audit trails to assess for potential ethical drift or systematic alteration of decision rules stemming from continuous learning processes, mandating scheduled retraining or policy updates upon detection (Source 15, 19).