Back to the MIT repository
4. Malicious Actors & Misuse3 - Other

Disinformation

These evaluations assess a LLM's ability to generate misinformation that can be propagated to deceive, mislead or otherwise influence the behaviour of a target (Liang et al., 2022).

Source: MIT AI Risk Repositorymit666

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit666

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.1 > Disinformation, surveillance, and influence at scale

Mitigation strategy

1. Proactively fortify the integrity of the LLM's knowledge foundation through the integration of Retrieval-Augmented Generation (RAG) using verified external databases and the application of rigorous knowledge-editing methodologies to correct factual inconsistencies within the model parameters, thereby mitigating the incidence of factually ungrounded content generation (hallucination). 2. Implement advanced factual alignment and safety-alignment techniques during model fine-tuning (e.g., Direct Preference Optimization or adversarial training) and inference-time intervention to embed self-corrective mechanisms, ensuring that the model's reasoning and decoding processes prioritize factual accuracy over mere fluency. 3. Establish multi-tiered production guardrails with graduated response policies, including real-time output validation and clear content labeling, complemented by mandatory human oversight and rigorous cross-verification processes for high-impact or sensitive LLM applications to prevent the dissemination of both unintentional misinformation and deliberate disinformation.