Misuse of AI model by user-performed persuasion
AI models can be influenced to accept misinformation through persuasive conversations, even when their initial responses are factually correct. Multi-turn persuasion can be more effective than single-turn persuasion attempts in altering the model’s stance [223].
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1147
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement advanced, temporal context-aware defense mechanisms, such as Temporal Context Awareness (TCA) frameworks, to actively monitor and mitigate multi-turn manipulation attacks by analyzing semantic drift, cross-turn intention consistency, and evolving conversational patterns, given the inadequacy of single-turn defenses. 2. Conduct rigorous, multi-turn adversarial red teaming and continuous model evaluation against persuasive adversarial prompts (PAPs) throughout the AI lifecycle to proactively identify and close specific vulnerabilities to sophisticated user-performed persuasion tactics. 3. Develop and integrate user-facing interventions, such as inoculation (pre-emptively educating users on AI persuasive techniques) and transparency mandates, to enhance user resilience and cognitive security against AI-generated misinformation and deception.