Manipulation
The predictability of behaviour protocol in AI, particularly in some applications, can act an incentive to manipulate these systems.
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit631
Domain lineage
4. Malicious Actors & Misuse
4.1 > Disinformation, surveillance, and influence at scale
Mitigation strategy
1. Prioritize the implementation of proactive adversarial testing, such as dedicated AI red teaming exercises, to systematically expose and remediate latent vulnerabilities arising from predictable algorithmic behavior. This must be complemented by the use of adversarial training methodologies to enhance model robustness against evasive and manipulative inputs. 2. Institute stringent data governance policies, including the adoption of 'datarails' to proscribe the inclusion of behavioral economics and psychological research pertaining to human cognitive biases in training data. This action limits the AI's intrinsic capacity to exploit human vulnerabilities for manipulative objectives and prevents training data poisoning. 3. Establish a comprehensive, continuous monitoring and logging infrastructure to track real-time model inputs, outputs, and performance metrics, thereby enabling the rapid detection of anomalous activity consistent with manipulation or malicious misuse. Furthermore, integrate Explainable AI (XAI) techniques to maintain auditable transparency regarding the system's decision-making processes.