Risks from models and algorithms (Risks of adversarial attack)
Attackers can craft well-designed adversarial examples to subtly mislead, influence, and even manipulate AI models, causing incorrect outputs and potentially leading to operational failures.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit686
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement Adversarial Training Regimes Fortify the model's inherent resilience by systematically incorporating carefully crafted adversarial examples into the training dataset and loss function. This process ensures the model learns to generalize more effectively by producing correct predictions even when presented with subtly perturbed inputs, thereby raising the computational cost and complexity required for successful evasion attacks. 2. Establish Continuous Adversarial Testing and Red Teaming Institute a mandatory, periodic program of rigorous adversarial testing and red teaming exercises throughout the model's operational lifecycle. This proactive security measure is critical for empirically assessing the efficacy of deployed defenses and for identifying zero-day vulnerabilities or model-transferability risks that emerge from novel attack mechanisms. 3. Deploy Robust Input Validation and Feature Extraction Pipelines Integrate advanced input sanitization and validation checks at the inference stage to detect and eliminate malicious or anomalous data before it reaches the core model. Furthermore, utilize techniques such as robust feature extraction to isolate meaningful input signals and minimize the exploitable influence of low-magnitude, adversarial perturbations on the final prediction.