Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Models distracted by irrelevant context

Models can easily become distracted by irrelevant provided information (such as “context” in LLMs), leading to a significant decrease in their performance after introducing irrelevant information. This can happen with different prompting techniques, including chain-of-thought prompting [184].

Source: MIT AI Risk Repositorymit1143

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1143

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Implement Contextual Filtration and Retrieval-Augmented Generation RAG The primary strategy is to prevent irrelevant context from entering the active prompt window by employing rigorous pre-processing and dynamic retrieval mechanisms. This includes structured note-taking, context compression, and the use of Retrieval-Augmented Generation RAG to query and insert only the most semantically relevant information, thereby minimizing the contextual noise and the associated processing burden on the Large Language Model LLM (Sources 10, 11). 2. Utilize Robust Multi-Path Prompting Techniques Enhance the inference stage's reliability by mandating the generation of multiple, independent reasoning paths. Techniques such as Self-Consistency Chain-of-Thought CoT, which selects the answer with the highest consensus, or Uncertainty-Routed CoT, which explicitly assesses confidence at each step, are critical for mitigating the influence of irrelevant context by verifying the final output across diverse logical trajectories (Sources 13, 18, 20). 3. Conduct Adversarial Fine-Tuning and Mechanistic Intervention For foundational models or high-stakes applications, enhance intrinsic robustness by fine-tuning the model on adversarially generated datasets that systematically include irrelevant context. Furthermore, investigate and apply mechanistic interpretations, such as identifying and attenuating "entrainment heads," to directly address the circuit-level cause of the distraction phenomenon (Sources 9, 12).