Propagating misconceptions/ false beliefs
Generating or spreading false, low-quality, misleading, or inaccurate information that causes people to develop false or inaccurate perceptions and beliefs
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit263
Domain lineage
3. Misinformation
3.1 > False or misleading information
Mitigation strategy
1. Develop and enforce rigorous content filtering mechanisms, including fine-tuning models against established safety policies and integrating real-time fact-checking APIs to proactively restrict the generation of known false or misleading statements. 2. Implement robust provenance tracing technologies, such as digital watermarking or metadata embedding, to clearly label synthetic content and enable rapid identification of the source model responsible for the propagation of misinformation. 3. Mandate continuous adversarial testing (red-teaming) throughout the development lifecycle to proactively identify and mitigate novel jailbreaks, prompt injection vulnerabilities, and other vectors for the intentional or unintentional generation and spread of false beliefs.
ADDITIONAL EVIDENCE
Example: A synthetic video of a nuclear explosion prompting mass panic (Alba, 2023)*