Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Limitations in adversarial robustness

AI models and systems are vulnerable to manipulation through adversarial inputs.

Source: MIT AI Risk Repositorymit1076

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1076

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement Robust Adversarial Training Employ advanced Adversarial Training (AT) methodologies, such as Projected Gradient Descent (PGD) based training, to augment the model's training data with crafted adversarial examples, thereby minimizing the adversarial loss function and substantially enhancing the model's inherent $\\ell\_{\\infty}/\\ell\_{2}$-robustness against a spectrum of evasion attacks. 2. Deploy Pre-Inference Input Purification Establish a robust input validation and purification pipeline at the inference stage to perform real-time perturbation mitigation. Techniques such as generative denoising diffusion processes (e.g., LoRID) or rigorous schema checks should be utilized to eliminate subtle adversarial artifacts and ensure input integrity prior to classification. 3. Institutionalize Continuous Adversarial Testing and Monitoring Establish a comprehensive Adversarial Risk Management Framework that mandates regular Red-Teaming and vulnerability assessments using state-of-the-art white-box and black-box attacks. This must be complemented by real-time anomaly detection systems to monitor prediction stability and flag suspicious input patterns indicative of an active manipulation attempt.