Data poisoning
A type of adversarial attack where an adversary or malicious insider injects intentionally corrupted, false, misleading, or incorrect samples into the training or fine-tuning datasets.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
1 - Pre-deployment
Risk ID
mit1285
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Establish and enforce rigorous data provenance and validation protocols throughout the data pipeline lifecycle. This necessitates leveraging anomaly detection algorithms and statistical methods to filter outliers and suspicious data points before ingestion, along with maintaining a verifiable, auditable record of all data sources, transformations, and access requests to expedite tracing the origin of any compromise. 2. Implement and strictly enforce the Principle of Least Privilege (PoLP) for access to training datasets and model configurations. Access rights must be narrowly defined and regularly recertified to ensure that only authorized entities possess the minimum necessary permissions to modify or ingest data, thereby mitigating the risk of insider threats and compromised accounts. 3. Employ a comprehensive defense-in-depth strategy incorporating continuous model behavior monitoring and targeted adversarial training. Continuous monitoring involves tracking key performance indicators and using 'golden datasets' as ground truth to detect performance degradation or anomalous outputs, while adversarial training proactively enhances model resilience by exposing the system to simulated poisoned inputs during the development cycle.