Misinformation Harms
Harms that arise from the language model providing false or misleading information
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit240
Domain lineage
3. Misinformation
3.0 > Misinformation
Mitigation strategy
A list of prioritized mitigation strategies for Misinformation Harms:1. Systemic Factual Assurance via Retrieval Augmentation and Data Governance Implement Retrieval Augmented Generation (RAG) architectures to anchor Large Language Model (LLM) outputs to verified, authoritative external data sources. Concurrently, enforce rigorous data quality and integrity measures, including regular checks and audits of training and input data, to mitigate factual errors stemming from incomplete, inconsistent, or erroneous model knowledge. 2. Multi-Layered Output Validation and Oversight Establish a robust, multi-layered validation framework that integrates both automated and human-in-the-loop (HIL) systems. Utilize advanced factuality evaluation metrics, such as FactScore, to systematically assess the factual precision of generated content, and institute mandatory human review control points for critical or high-impact outputs before deployment. 3. Proactive Auditing, Security, and Transparency Mandate continuous algorithmic bias audits and adversarial red-teaming exercises to proactively identify and mitigate systemic vulnerabilities, including manipulation tactics and bias that could amplify misinformation. Furthermore, implement Explainable AI (XAI) principles to provide transparent explanations of flagging decisions and model reasoning, which is essential for auditability and building public trust.
ADDITIONAL EVIDENCE
LMs can assign high probabilities to utterances that constitute false or misleading claims. Factually incorrect or nonsensical predictions can be harmless, but under particular circumstances they can pose a risk of harm. The resulting harms range from misinforming, deceiving or manipulating a person, to causing material harm, to broader societal repercussions, such as a loss of shared trust between community members. These risks form the focus of this section.