Poisoning
Data Poisoning involves deliberately corrupting a model’s training dataset to introduce vulnerabilities, derail its learning process, or cause it to make incorrect predictions (Carlini et al., 2023). For example, the tool Nightshade is a data poisoning tool, which allows artists to add invisible changes to the pixels in their art before uploading online, to break any models that use it for training.9 Such attacks exploit the fact that most GenAI models are trained on publicly available datasets like images and videos scraped from the web, which malicious actors can easily compromise.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
1 - Pre-deployment
Risk ID
mit1268
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement Continuous Data Validation and Monitoring. Institute robust, real-time monitoring systems and continuous data validation techniques, including statistical anomaly detection and clustering algorithms, to identify and flag suspicious or outlier data points that may indicate the introduction of poisoned samples into the training dataset before model corruption occurs. 2. Enforce Strict Data Provenance and Access Controls. Strengthen the security posture of data pipelines by enforcing least-privilege access policies (RBAC and MFA) and implementing transparent data provenance tracking. This provides a traceable, unalterable record of all data modifications and origins, which is critical for incident investigation and root cause analysis. 3. Maintain Rapid Rollback Capabilities. Develop and test an immediate incident response protocol for swift model and dataset reversion. This necessitates maintaining regularly versioned backups of verified, clean training datasets and model checkpoints, enabling rapid rollback to a healthy infrastructure state to minimize disruption following a detected attack.