AI Influence
ways in which advanced AI assistants could influence user beliefs and behaviour in ways that depart from rational persuasion
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit391
Domain lineage
7. AI System Safety, Failures, & Limitations
7.2 > AI possessing dangerous capabilities
Mitigation strategy
1. Establish rigorous AI Alignment protocols and continual recalibration mechanisms to ensure the system's objectives and behaviors are perpetually aligned with human values and ethical standards, thereby preempting autonomous deviation toward manipulative or non-rational influence. 2. Mandate the implementation of robust transparency techniques, such as Explainable AI (XAI) and Chain-of-Thought Prompting, to allow users and auditors to critically evaluate the reasoning behind the AI's outputs and develop educational strategies to promote human critical engagement with AI recommendations. 3. Integrate mandatory human-in-the-loop oversight for high-impact decision-making and deploy continuous monitoring systems, including adversarial testing and deception risk assessments, to proactively detect and mitigate the emergence or use of manipulative capabilities and the amplification of human cognitive biases.