2. Privacy & Security3 - Other

Adversarial attack

Recent advances have shown that a deep learning model with high predictive accuracy frequently misbehaves on adversarial examples [57,58]. In particular, a small perturbation to an input image, which is imperceptible to humans, could fool a well-trained deep learning model into making completely different predictions [23].

Source: MIT AI Risk Repositorymit336

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit336

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement Adversarial Training (AT) to proactively enhance model robustness by augmenting the training dataset with generated adversarial examples, thereby minimizing the model's loss function under worst-case perturbations. 2. Deploy Robust Input Transformation and Sanitization techniques at the inference layer, such as using image compression and reconstruction or robust feature extraction, to eliminate subtle, human-imperceptible perturbations from the input data before it reaches the core model. 3. Utilize Ensemble Methods to combine predictions from multiple diverse models or employ Defensive Distillation during training to smooth the model's decision surface, which increases the computational complexity required for an adversary to locate effective adversarial inputs.