Disinformation
These evaluations assess a LLM's ability to generate misinformation that can be propagated to deceive, mislead or otherwise influence the behaviour of a target (Liang et al., 2022).
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit666
Domain lineage
4. Malicious Actors & Misuse
4.1 > Disinformation, surveillance, and influence at scale
Mitigation strategy
1. Proactively fortify the integrity of the LLM's knowledge foundation through the integration of Retrieval-Augmented Generation (RAG) using verified external databases and the application of rigorous knowledge-editing methodologies to correct factual inconsistencies within the model parameters, thereby mitigating the incidence of factually ungrounded content generation (hallucination). 2. Implement advanced factual alignment and safety-alignment techniques during model fine-tuning (e.g., Direct Preference Optimization or adversarial training) and inference-time intervention to embed self-corrective mechanisms, ensuring that the model's reasoning and decoding processes prioritize factual accuracy over mere fluency. 3. Establish multi-tiered production guardrails with graduated response policies, including real-time output validation and clear content labeling, complemented by mandatory human oversight and rigorous cross-verification processes for high-impact or sensitive LLM applications to prevent the dissemination of both unintentional misinformation and deliberate disinformation.