Poisoning Attacks
fool the model by manipulating the training data, usually performed on classification models
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
1 - Pre-deployment
Risk ID
mit509
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Training Data Sanitization and Validation Implement advanced data validation and outlier detection techniques (e.g., statistical methods, clustering algorithms) to identify and remove anomalous or suspicious data points prior to model incorporation, thereby preventing corrupted data from entering the training set. 2. Establish Secure Data Provenance and Access Controls Enforce the principle of least privilege (POLP) and robust access controls for all data sources to limit unauthorized access and manipulation. Maintain detailed, tamper-proof records (data provenance/lineage) of all data transformations to deter attacks and facilitate forensic investigation. 3. Implement Continuous Monitoring and Robust Training Utilize real-time monitoring and auditing to detect anomalies in input/output data or signs of performance degradation that signal a potential attack. Proactively employ adversarial training—introducing adversarial examples—to enhance the model's intrinsic resilience against manipulative inputs.
ADDITIONAL EVIDENCE
he trained (poisoned) model would learn misbehaviors at training time, leading to misclassification at inference time. In addition, attackers can also use optimizations to craft samples that maximize the model’s error