Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Defamation

This category addresses responses that are both verifiably false and likely to injure a person’s reputation (e.g., libel, slander, disparagement).

Source: MIT AI Risk Repositorymit365

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit365

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Conduct rigorous pre-deployment red-teaming and adversarial testing to proactively identify and mitigate model tendencies to generate verifiably false, high-risk outputs (e.g., those summarizing individuals, making claims about public figures, or discussing sensitive topics such as legal, health, or political matters). 2. Implement robust provenance tracking and audit trail mechanisms to log all user prompts, system outputs, and moderation actions. This enables rapid forensic analysis for assessing fault, establishing control, and facilitating swift notice-and-takedown protocols upon discovery or notification of defamatory content. 3. Mandate a Human-in-the-Loop (HITL) verification system requiring professional judgment and review for all AI-generated content intended for public dissemination, especially when concerning sensitive or potentially actionable information, to ensure adherence to the legal Standard of Care and prevent unintentional publication of falsehoods.