Propaganda
LLMs can be leveraged, by malicious users, to proactively generate propaganda information that can facilitate the spreading of a target
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit493
Domain lineage
4. Malicious Actors & Misuse
4.1 > Disinformation, surveillance, and influence at scale
Mitigation strategy
1. Implement robust input validation and output filtering mechanisms (e.g., toxicity detectors, content filters) across the LLM lifecycle to proactively block or sanitize user prompts requesting propaganda generation and to prevent the dissemination of high-risk, harmful, or biased content generated by the model 2. Establish a comprehensive AI governance framework that defines acceptable use policies, assigns clear accountability for misuse, and mandates regular audits of the AI system's compliance and ethical performance to prevent and address the leveraging of LLMs for malicious purposes 3. Deploy specialized real-time threat detection and monitoring systems to identify anomalous usage patterns (e.g., sudden spikes in request volume, repeated adversarial queries) that may indicate automated abuse or large-scale generation of propaganda, enabling rapid incident response and mitigation
ADDITIONAL EVIDENCE
Generating propaganda against targeted people (e.g. celebrities): Figure 18. • Advocating for terrorism: Figure 19. • Creating extreme and harmful political propaganda