2. Privacy & Security2 - Post-deployment

Evasion attack

Evasion attacks attempt to make a model output incorrect results by slightly perturbing the input data that is sent to the trained model.

Source: MIT AI Risk Repositorymit1288

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1288

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Proactively implement Adversarial Training protocols, such as Projected Gradient Descent (PGD)-based methods, to enhance the model's intrinsic robustness by exposing it to adversarial examples during the training phase and thereby smoothing its decision boundaries. 2. Employ Input Transformation and Preprocessing techniques, such as Feature Squeezing or denoising, to neutralize the subtle, low-magnitude perturbations characteristic of evasion attacks before they reach the inference engine. 3. Integrate an Adversarial Example Detection and Rejection layer to monitor incoming data, leveraging methods like uncertainty estimation (e.g., predictive entropy) or feature-space anomaly detection to flag and discard inputs exhibiting anomalous characteristics.