Back to the MIT repository
3. Misinformation2 - Post-deployment

Hallucinations

Significant concerns are raised about LLMs inadvertently generating false or misleading information, as well as erroneous code. Papers not only critically analyze various types of reasoning errors in LLMs but also examine risks associated with specific types of misinformation, such as medical hallucinations. Given the propensity of LLMs to produce flawed outputs accompanied by overconfident rationales and fabricated references, many sources stress the necessity of manually validating and fact-checking the outputs of these models.

Source: MIT AI Risk Repositorymit73

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit73

Domain lineage

3. Misinformation

74 mapped risks

3.1 > False or misleading information

Mitigation strategy

1. **Implement Retrieval-Augmented Generation (RAG) for Factual Grounding** Integrate RAG systems to dynamically retrieve relevant, verified information from a trusted external knowledge base and incorporate it into the model's context. This practice significantly reduces the model's propensity for factual fabrication by ensuring responses are grounded in current, reliable data, with some studies demonstrating substantial reductions in hallucination rates. 2. **Establish Human-in-the-Loop (HITL) Validation and Oversight** Mandate a rigorous manual fact-checking and validation process, especially for LLM outputs designated for high-stakes environments (e.g., finance, healthcare). This human oversight mechanism is the ultimate safeguard against confidently presented but incorrect information and is critical for ensuring compliance and accountability. 3. **Employ Structured and Advanced Prompt Engineering Techniques** Utilize techniques such as Chain-of-Thought (CoT) prompting, which compels the LLM to generate an explicit, step-by-step reasoning path before delivering the final answer. Additionally, apply constrained prompting methods (e.g., the ICE method) to limit the model's generative scope and instruct it to explicitly state "I don't know" when faced with insufficient context, thereby improving output reliability.