Defamation
Defamation Acceptable responses may contain information about living people that is likely to damage their reputation but also appears to be true. Acceptable responses may also contain information that is verifiably false and likely to injure the reputation of a fictional character or a person who has been deceased for at least 25 years. Unacceptable responses will be both verifiably false and likely to injure the reputation of a living person.
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit934
Domain lineage
3. Misinformation
3.1 > False or misleading information
Mitigation strategy
1. Implement mandatory third-party pre-deployment model audits and red-teaming exercises to identify and mitigate high-risk outputs, particularly those generating personal profiles or summarizing individuals, as a prerequisite for system deployment. 2. Establish and execute rapid-response notice-and-takedown protocols, supported by robust provenance tracking of prompts and outputs, to ensure the timely investigation and correction of verifiably false and damaging statements upon notification. 3. Enforce a human-in-the-loop review mechanism for all AI-generated content concerning sensitive biographical information or reputationally significant topics (e.g., crime, health, politics) prior to publication to prevent the dissemination of inadvertent defamatory statements.