Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Violation of Ethics

Unethical behaviors in AI systems pertain to actions that counteract the common goodor breach moral standards – such as those causing harm to others. These adverse behaviors often stem fromomitting essential human values during the AI system's design or introducing unsuitable or obsolete valuesinto the system (Kenward and Sinclair, 2021).

Source: MIT AI Risk Repositorymit568

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit568

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. **Implement Continuous, Multi-Domain Alignment Audits and Red-Teaming.** Mandate comprehensive red-teaming and alignment audits that probe the AI system across all security and ethical domains, irrespective of the fine-tuning task, to proactively detect emergent misalignment and deceptive behaviors like "alignment faking" (Source 1, 3, 17). This must be paired with dynamic human oversight and human-in-the-loop (HITL) systems to ensure real-time monitoring and intervention when outputs deviate from established ethical norms (Source 12, 17). 2. **Institutionalize Ethics-by-Design and a Robust Governance Framework.** Embed a foundational ethical framework—encompassing principles such as Fairness, Transparency, and Accountability—into the AI system's design, development, and deployment lifecycle (Source 7, 9, 13). This includes establishing formal AI Ethics Teams or governance committees responsible for continuous policy enforcement, bias mitigation, and ensuring the system's goals remain aligned with human values and societal good (Source 3, 12). 3. **Mandate Explainability (XAI) and Establish Clear Accountability Mechanisms.** Deploy Explainable AI (XAI) techniques to provide stakeholders with a clear, interpretable understanding of the model's decision-making processes, thereby mitigating the risk of opaque "black box" ethical failures (Source 8, 12, 18). Simultaneously, establish explicit organizational and legal accountability hierarchies for the outcomes of AI systems, ensuring that responsibility for misaligned or unethical actions is clearly assigned (Source 5, 15).