Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Misuse of AI model by user-performed persuasion

AI models can be influenced to accept misinformation through persuasive conversations, even when their initial responses are factually correct. Multi-turn persuasion can be more effective than single-turn persuasion attempts in altering the model’s stance [223].

Source: MIT AI Risk Repositorymit1147

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1147

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement advanced, temporal context-aware defense mechanisms, such as Temporal Context Awareness (TCA) frameworks, to actively monitor and mitigate multi-turn manipulation attacks by analyzing semantic drift, cross-turn intention consistency, and evolving conversational patterns, given the inadequacy of single-turn defenses. 2. Conduct rigorous, multi-turn adversarial red teaming and continuous model evaluation against persuasive adversarial prompts (PAPs) throughout the AI lifecycle to proactively identify and close specific vulnerabilities to sophisticated user-performed persuasion tactics. 3. Develop and integrate user-facing interventions, such as inoculation (pre-emptively educating users on AI persuasive techniques) and transparency mandates, to enhance user resilience and cognitive security against AI-generated misinformation and deception.