Models distracted by irrelevant context
Models can easily become distracted by irrelevant provided information (such as “context” in LLMs), leading to a significant decrease in their performance after introducing irrelevant information. This can happen with different prompting techniques, including chain-of-thought prompting [184].
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1143
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Implement Contextual Filtration and Retrieval-Augmented Generation RAG The primary strategy is to prevent irrelevant context from entering the active prompt window by employing rigorous pre-processing and dynamic retrieval mechanisms. This includes structured note-taking, context compression, and the use of Retrieval-Augmented Generation RAG to query and insert only the most semantically relevant information, thereby minimizing the contextual noise and the associated processing burden on the Large Language Model LLM (Sources 10, 11). 2. Utilize Robust Multi-Path Prompting Techniques Enhance the inference stage's reliability by mandating the generation of multiple, independent reasoning paths. Techniques such as Self-Consistency Chain-of-Thought CoT, which selects the answer with the highest consensus, or Uncertainty-Routed CoT, which explicitly assesses confidence at each step, are critical for mitigating the influence of irrelevant context by verifying the final output across diverse logical trajectories (Sources 13, 18, 20). 3. Conduct Adversarial Fine-Tuning and Mechanistic Intervention For foundational models or high-stakes applications, enhance intrinsic robustness by fine-tuning the model on adversarially generated datasets that systematically include irrelevant context. Furthermore, investigate and apply mechanistic interpretations, such as identifying and attenuating "entrainment heads," to directly address the circuit-level cause of the distraction phenomenon (Sources 9, 12).