4. Malicious Actors & Misuse2 - Post-deployment

Misinformation and Manipulation

Recent studies have demonstrated that LLMs can be exploited to craft deceptive narratives with levels of persuasiveness similar to human-generated content (Pan et al., 2023b; Spitale et al., 2023), to fabri- cate fake news (Zellers et al., 2019; Zhou et al., 2023f), and to devise automated influence operations aimed at manipulating the perspectives of targeted audiences (Goldstein et al., 2023). LLMs have also been found to be used in malicious social botnets (Yang and Menczer, 2023), powering automated accounts used to disseminate coordinated messages. More broadly, the use of LLMs for the deliberate generation of misleading information could significantly lower the barrier for propaganda and manip- ulation (Aharoni et al., 2024), as LLMs can generate highly credible misinformation with significant cost-savings compared to human authorship (Musser, 2023), while achieving considerable scale and speed of content generation (Buchanan et al., 2021; Goldstein et al., 2023).

Source: MIT AI Risk Repositorymit1489

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1489

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.3 > Fraud, scams, and targeted manipulation

Mitigation strategy

1. Foundational Factual Alignment and Grounding: Implement robust Retrieval-Augmented Generation (RAG) systems integrated with trusted, verifiable external knowledge bases. Complement this with Self-Verification Architectures (e.g., autonomous fact-checking chains) during inference to validate output claims against both internal knowledge and external data, substantially reducing hallucination and fabricated content. 2. Layered Adversarial Defense and Output Guardrails: Enforce rigorous Input Validation and Sanitization to mitigate adversarial attempts at prompt injection and the insertion of misleading context. Simultaneously, deploy multi-tiered Output Moderation and Content Filtering mechanisms, utilizing semantic classifiers and confidence calibration thresholds to block or quarantine high-risk content that promotes propaganda, scams, or targeted manipulation. 3. Mandatory Human Oversight and Risk Transparency: Establish a Human-in-the-Loop (HITL) protocol for all high-stakes applications, requiring expert review for outputs in regulated or critical domains before dissemination. Furthermore, institute a comprehensive Transparency and Risk Communication policy, which includes prominently labeling all LLM-generated content and educating users on the model's inherent limitations regarding factual accuracy and bias.