Back to the MIT repository
3. Misinformation2 - Post-deployment

Risk area 3: Misinformation Harms

These risks arise from the LM outputting false, misleading, nonsensical or poor quality information, without malicious intent of the user. (The deliberate generation of disinformation, false information that is intended to mislead, is discussed in the section on Malicious Uses.) Resulting harms range from unintentionally misinforming or deceiving a person, to causing material harm, and amplifying the erosion of societal distrust in shared information. Several risks listed here are well-documented in current large-scale LMs as well as in other language technologies

Source: MIT AI Risk Repositorymit213

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit213

Domain lineage

3. Misinformation

74 mapped risks

3.0 > Misinformation

Mitigation strategy

1. Implementation of Retrieval-Augmented Generation (RAG) Employ a Retrieval-Augmented Generation (RAG) architecture to dynamically ground the Large Language Model's (LLM) output in verified, external, and up-to-date knowledge repositories. This approach directly mitigates the risk of factual inaccuracies and 'hallucinations' by ensuring the generated response is based on authoritative, traceable data rather than static training memory. 2. Integration of Uncertainty Quantification (UQ) and Confidence Calibration Integrate Uncertainty Quantification (UQ) mechanisms to enable the LLM to self-assess and reliably express its confidence in a generated response. This is essential for preventing the harm caused by overconfident, yet false, outputs. Low-confidence predictions should be flagged for selective prediction or human verification, enhancing the overall trustworthiness of the system. 3. Deployment of Contextual Warning and Fact-Checking Mechanisms Utilize tertiary prevention strategies, such as the application of transparent, evidence-based fact-checking and source-credibility labels on high-risk output. Furthermore, establish a mechanism for timely refutation interventions that provide corrective, evidence-based information for known areas of misinformation.

ADDITIONAL EVIDENCE

Example: For example, a statement may occur frequently in a training corpus but not be factually correct (‘pigs fly’). The lexical pattern of a factual statement can closely resemble that of its opposite which is false (‘birds can fly’ and ‘birds cannot fly’). Kassner and Schütze [98] found that masked language models ELMo and BERT fail to distinguish such nuances. Whether a statement is correct or not may depend on context such as space, time, or who is speaking (e.g. ‘I like you’, ‘Obama is US president’). Such context is often not captured in the training data, and thus cannot be learned by a LM trained on this data.