Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Machine ethics

These evaluations assess the morality of LLMs, focusing on issues such as their ability to distinguish between moral and immoral actions, and the circumstances in which they fail to do so.

Source: MIT AI Risk Repositorymit649

ENTITY

2 - AI

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit649

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Integrate Multi-Framework Ethical Alignment: Implement advanced alignment techniques, such as Reinforcement Learning from Human Feedback (RLHF) or ethical-layer fine-tuning, leveraging a composite of established moral theories (e.g., deontology, consequentialism, virtue ethics) to ensure the model's decision-making process is morally consistent and contextually sensitive, thereby reducing intrinsic ethical biases. 2. Mandate Rigorous Moral Reasoning Evaluation: Systematically employ specialized, nuanced ethical benchmarks designed to test complex trade-offs and assess the coherence of the LLM's justifications, moving beyond surface-level sentiment analysis to quantify the model's moral decision-making capabilities and identify specific circumstances of failure. 3. Establish Comprehensive Transparency Mechanisms: Develop and maintain detailed Model Cards that explicitly document the ethical guidelines, training data provenance, inherent moral limitations, and the specific ethical failure modes observed during evaluations, fostering external inspectability and accountability for the model's moral landscape. 4. Implement Human-in-the-Loop (HITL) Oversight: For all high-stakes applications or prompts involving complex moral dilemmas, require human reviewers to arbitrate or validate the LLM's proposed action or advice to prevent the deployment of potentially immoral or non-aligned outputs.