Back to the MIT repository
3. Misinformation2 - Post-deployment

Misinformation Harms

Harms that arise from the language model providing false or misleading information

Source: MIT AI Risk Repositorymit240

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit240

Domain lineage

3. Misinformation

74 mapped risks

3.0 > Misinformation

Mitigation strategy

A list of prioritized mitigation strategies for Misinformation Harms:1. Systemic Factual Assurance via Retrieval Augmentation and Data Governance Implement Retrieval Augmented Generation (RAG) architectures to anchor Large Language Model (LLM) outputs to verified, authoritative external data sources. Concurrently, enforce rigorous data quality and integrity measures, including regular checks and audits of training and input data, to mitigate factual errors stemming from incomplete, inconsistent, or erroneous model knowledge. 2. Multi-Layered Output Validation and Oversight Establish a robust, multi-layered validation framework that integrates both automated and human-in-the-loop (HIL) systems. Utilize advanced factuality evaluation metrics, such as FactScore, to systematically assess the factual precision of generated content, and institute mandatory human review control points for critical or high-impact outputs before deployment. 3. Proactive Auditing, Security, and Transparency Mandate continuous algorithmic bias audits and adversarial red-teaming exercises to proactively identify and mitigate systemic vulnerabilities, including manipulation tactics and bias that could amplify misinformation. Furthermore, implement Explainable AI (XAI) principles to provide transparent explanations of flagging decisions and model reasoning, which is essential for auditability and building public trust.

ADDITIONAL EVIDENCE

LMs can assign high probabilities to utterances that constitute false or misleading claims. Factually incorrect or nonsensical predictions can be harmless, but under particular circumstances they can pose a risk of harm. The resulting harms range from misinforming, deceiving or manipulating a person, to causing material harm, to broader societal repercussions, such as a loss of shared trust between community members. These risks form the focus of this section.