Knowledge conflicts in retrieval-augmented LLMs
AI models can be particularly sensitive to coherent external evidence, even when they come into conflict with the models’ prior knowledge. This may lead to models producing false outputs given false information during the retrieval- augmentation process, despite only a relatively small amount of false informa- tion input that is inconsistent with the model’s prior knowledge trained on much larger amounts of data [220].
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1144
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Implement Conflict-Resolution Frameworks Utilize advanced Retrieval-Augmented Generation (RAG) architectures, such as those leveraging Knowledge Graphs (e.g., TruthfulRAG) or employing novel decoding strategies (e.g., Conflict-Disentangle Contrastive Decoding - CD2), to actively resolve factual-level discrepancies and calibrate the model's confidence and preference when internal memory conflicts with external retrieved evidence. 2. Enhance Robustness via Adaptive Training and Retrieval Safeguards Systematically improve the model's intrinsic resilience to conflicting, irrelevant, or spurious external features by employing Adaptive Adversarial Training (RAAT) and fine-tuning on datasets specifically engineered with diverse noise and conflict scenarios. Integrate retrieval safeguards and credibility-aware mechanisms (e.g., CrAM) to dynamically filter or reduce the attentional influence of low-credibility or contradictory retrieved documents prior to generation. 3. Mandate Explicit Self-Correction and Abstention Protocols At the inference stage, deploy advanced prompt engineering techniques (e.g., Chain-of-Verification or explicit instruction) that compel the Large Language Model to engage in self-correction loops by verifying its generated response against the retrieved context. Furthermore, explicitly instruct the model to abstain from providing an answer ("I don't know") when high uncertainty is detected due to ambiguous or irreconcilable knowledge conflicts.