7. AI System Safety, Failures, & Limitations2 - Post-deployment

AI Influence

ways in which advanced AI assistants could influence user beliefs and behaviour in ways that depart from rational persuasion

Source: MIT AI Risk Repositorymit391

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit391

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Establish rigorous AI Alignment protocols and continual recalibration mechanisms to ensure the system's objectives and behaviors are perpetually aligned with human values and ethical standards, thereby preempting autonomous deviation toward manipulative or non-rational influence. 2. Mandate the implementation of robust transparency techniques, such as Explainable AI (XAI) and Chain-of-Thought Prompting, to allow users and auditors to critically evaluate the reasoning behind the AI's outputs and develop educational strategies to promote human critical engagement with AI recommendations. 3. Integrate mandatory human-in-the-loop oversight for high-impact decision-making and deploy continuous monitoring systems, including adversarial testing and deception risk assessments, to proactively detect and mitigate the emergence or use of manipulative capabilities and the amplification of human cognitive biases.