2. Privacy & Security2 - Post-deployment

Security

This is the risk of loss or harm from intentional subversion or forced failure.

Source: MIT AI Risk Repositorymit200

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit200

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Employ adversarial training techniques to enhance model resilience against input perturbations (evasion attacks) and rigorously implement data provenance tracking and validation across the training pipeline to prevent data poisoning. 2. Institute strong access controls, authentication protocols, and API rate-limiting mechanisms on deployed models to mitigate the risk of model theft via unauthorized querying and to slow down data extraction attempts. 3. Establish a comprehensive AI asset inventory and governance framework, mandating continuous security measures such as ethical hacking, red teaming, and anomaly detection to proactively identify and remediate emergent security vulnerabilities.

ADDITIONAL EVIDENCE

Goodfellow et al. discovered the ability to induce mispredictions in neural computer vision models by perturbing the input with small amounts of adversarially generated noise [80]. This is known as an evasion attack since it allows the attacker to evade classification by the system. Some attacks emulate natural phenomena such as raindrops, phonological variation, or code-mixing [11, 58, 180, 182, 200]. ML systems tend to be highly vulnerable if the models have not been explicitly trained to be robust to the attack. Another attack vector involves manipulating the training data such that the ML system can be manipulated with specific inputs during inference, (e.g., to bypass a biometric identification system) [34]. This is known as data poisoning. The application, control over training data, and model’s robustness to such attacks are potential risk factors. Finally, there is the risk of model theft. Researchers have demonstrated the ability to “steal” an ML model through ML-as-a-service APIs by making use of the returned metadata (e.g., confidence scores) [102, 110, 138, 184]. Extracted models can be deployed independent of the service, or used to craft adversarial examples to fool the original models. The application setting and design choices significantly affect the amount of metadata exposed externally. For example, while an autonomous vehicle does not return the confidence scores of its perception system’s predictions, model thieves may still be able to physically access the system and directly extract the model’s architecture definition and weights