Adversarial attack
Recent advances have shown that a deep learning model with high predictive accuracy frequently misbehaves on adversarial examples [57,58]. In particular, a small perturbation to an input image, which is imperceptible to humans, could fool a well-trained deep learning model into making completely different predictions [23].
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit336
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement Adversarial Training (AT) to proactively enhance model robustness by augmenting the training dataset with generated adversarial examples, thereby minimizing the model's loss function under worst-case perturbations. 2. Deploy Robust Input Transformation and Sanitization techniques at the inference layer, such as using image compression and reconstruction or robust feature extraction, to eliminate subtle, human-imperceptible perturbations from the input data before it reaches the core model. 3. Utilize Ensemble Methods to combine predictions from multiple diverse models or employ Defensive Distillation during training to smooth the model's decision surface, which increases the computational complexity required for an adversary to locate effective adversarial inputs.