Back to the MIT repository
4. Malicious Actors & Misuse2 - Post-deployment

Manipulation

The predictability of behaviour protocol in AI, particularly in some applications, can act an incentive to manipulate these systems.

Source: MIT AI Risk Repositorymit631

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit631

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.1 > Disinformation, surveillance, and influence at scale

Mitigation strategy

1. Prioritize the implementation of proactive adversarial testing, such as dedicated AI red teaming exercises, to systematically expose and remediate latent vulnerabilities arising from predictable algorithmic behavior. This must be complemented by the use of adversarial training methodologies to enhance model robustness against evasive and manipulative inputs. 2. Institute stringent data governance policies, including the adoption of 'datarails' to proscribe the inclusion of behavioral economics and psychological research pertaining to human cognitive biases in training data. This action limits the AI's intrinsic capacity to exploit human vulnerabilities for manipulative objectives and prevents training data poisoning. 3. Establish a comprehensive, continuous monitoring and logging infrastructure to track real-time model inputs, outputs, and performance metrics, thereby enabling the rapid detection of anomalous activity consistent with manipulation or malicious misuse. Furthermore, integrate Explainable AI (XAI) techniques to maintain auditable transparency regarding the system's decision-making processes.