Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Untruthful Output

AI systems such as LLMs can produce either unintentionally or deliberately inaccurateoutput. Such untruthful output may diverge from established resources or lack verifiability, commonly referredto as hallucination (Bang et al., 2023; Zhao et al., 2023). More concerning is the phenomenon wherein LLMsmay selectively provide erroneous responses to users who exhibit lower levels of education (Perez et al.,2023).

Source: MIT AI Risk Repositorymit565

ENTITY

2 - AI

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit565

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. **Implement Retrieval-Augmented Generation (RAG)** Utilize RAG frameworks to dynamically augment the LLM's input context with verified, real-time information retrieved from trusted external data repositories. This strategy is foundational for grounding outputs in factual evidence, thereby confining the model's generation space and substantially mitigating the risk of factual fabrication (hallucination). 2. **Employ Model Fine-Tuning and Alignment** Conduct targeted model fine-tuning using curated, high-quality data and apply alignment techniques, such as Reinforcement Learning from Human Feedback (RLHF) or parameter-efficient tuning (PET). This approach intrinsically trains the model to prioritize factual accuracy and avoid generating unverifiable claims, improving the model's internal coherence and truthfulness. 3. **Institute Mandatory Human Oversight and Validation** Establish a robust human-in-the-loop protocol requiring subject-matter experts to cross-verify all LLM outputs designated as high-stakes or critical before deployment or action. This final control mechanism is essential for mitigating the compounded risk of user overreliance on potentially inaccurate content, ensuring that critical decisions are based on fact-checked data.