2. Privacy & Security1 - Pre-deployment

Poisoning Attacks

fool the model by manipulating the training data, usually performed on classification models

Source: MIT AI Risk Repositorymit509

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

1 - Pre-deployment

Risk ID

mit509

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Training Data Sanitization and Validation Implement advanced data validation and outlier detection techniques (e.g., statistical methods, clustering algorithms) to identify and remove anomalous or suspicious data points prior to model incorporation, thereby preventing corrupted data from entering the training set. 2. Establish Secure Data Provenance and Access Controls Enforce the principle of least privilege (POLP) and robust access controls for all data sources to limit unauthorized access and manipulation. Maintain detailed, tamper-proof records (data provenance/lineage) of all data transformations to deter attacks and facilitate forensic investigation. 3. Implement Continuous Monitoring and Robust Training Utilize real-time monitoring and auditing to detect anomalies in input/output data or signs of performance degradation that signal a potential attack. Proactively employ adversarial training—introducing adversarial examples—to enhance the model's intrinsic resilience against manipulative inputs.

ADDITIONAL EVIDENCE

he trained (poisoned) model would learn misbehaviors at training time, leading to misclassification at inference time. In addition, attackers can also use optimizations to craft samples that maximize the model’s error