Large-Scale Persuasion and Harmful Manipulation Risks
AI systems can be gravely misused to distort public perception and compromise social stability through the generation of synthetic content (e.g., deepfakes, sophisticated fake news) and the strategic manipulation of digital platforms with large user bases to disseminate or precisely target misleading information or ideologies.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1448
Domain lineage
4. Malicious Actors & Misuse
4.1 > Disinformation, surveillance, and influence at scale
Mitigation strategy
1. Implement Advanced Persuasion Safety Benchmarks and Alignment Augment AI safety evaluations to include the measurement and mitigation of a model's *propensity* to attempt persuasion on harmful or ethically fraught topics. This requires robust red-teaming and the strengthening of refusal mechanisms against incitement, radicalization, and malicious content generation, ensuring that post-training techniques prioritize factual accuracy and ethical alignment over persuasive capability, and that these safeguards are resilient against adversarial jailbreaking. 2. Mandate Content Provenance and Deploy Multi-Modal Detection Systems Require platforms and developers to adopt technical standards for content provenance, such as cryptographic digital watermarking or metadata certification for all AI-generated content (AIGC). Concurrently, invest in and deploy sophisticated, multi-modal detection and prevention software capable of analyzing text, image, and video content for synthetic media (deepfakes) and disinformation at scale, facilitating timely algorithmic removal and content moderation. 3. Establish Updated Regulatory Frameworks and Systemic Resilience Education Governments and international bodies must develop and implement harmonized legislation that explicitly addresses the malicious creation and mass dissemination of deepfakes and AI-driven disinformation, creating clear legal consequences for misuse. This regulatory action must be complemented by comprehensive public cyber-wellness and media literacy programs designed to enhance audience critical assessment skills and resilience against highly personalized, fact-compromised persuasive narratives.