2. Privacy & Security2 - Post-deployment

Adversarial AI: Data and Model Exfiltration Attacks

Other forms of abuse can include privacy attacks that allow adversaries to exfiltrate or gain knowledge of the private training data set or other valuable assets. For example, privacy attacks such as membership inference can allow an attacker to infer the specific private medical records that were used to train a medical AI diagnosis assistant. Another risk of abuse centers around attacks that target the intellectual property of the AI assistant through model extraction and distillation attacks that exploit the tension between API access and confidentiality in ML models. Without the proper mitigations, these vulnerabilities could allow attackers to abuse access to a public-facing model API to exfiltrate sensitive intellectual property such as sensitive training data and a model’s architecture and learned parameters.

Source: MIT AI Risk Repositorymit384

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit384

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Enforce rigorous access controls and query limitations on all public-facing model APIs by implementing zero-trust principles, individualized rate limiting, and multi-factor authentication to restrict the volume and speed of adversarial data harvesting for model extraction and distributed query attacks. 2. Implement output obfuscation strategies, including prediction perturbation and the use of differential privacy mechanisms, to minimize the informational entropy of model responses. This reduces the utility of model outputs for both parameter-stealing and functionality-stealing extraction attacks as well as for membership inference attacks. 3. Proactively harden the AI system using dedicated adversarial training and privacy-preserving data techniques (e.g., Self-Distillation or generative data preprocessing) to reduce the susceptibility of the model to transferability and to minimize the detectable statistical difference between member and non-member inputs, thereby mitigating membership inference risk.