Defamation
This category addresses responses that are both verifiably false and likely to injure a person’s reputation (e.g., libel, slander, disparagement).
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit365
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Conduct rigorous pre-deployment red-teaming and adversarial testing to proactively identify and mitigate model tendencies to generate verifiably false, high-risk outputs (e.g., those summarizing individuals, making claims about public figures, or discussing sensitive topics such as legal, health, or political matters). 2. Implement robust provenance tracking and audit trail mechanisms to log all user prompts, system outputs, and moderation actions. This enables rapid forensic analysis for assessing fault, establishing control, and facilitating swift notice-and-takedown protocols upon discovery or notification of defamatory content. 3. Mandate a Human-in-the-Loop (HITL) verification system requiring professional judgment and review for all AI-generated content intended for public dissemination, especially when concerning sensitive or potentially actionable information, to ensure adherence to the legal Standard of Care and prevent unintentional publication of falsehoods.