Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Data exfiltration

Data Exfiltration goes beyond revealing private information, and involves illicitly obtaining the training data used to build a model that may be sensitive or proprietary. Model Extraction is the same attack, only directed at the model instead of the training data — it involves obtaining the architecture, parameters, or hyper-parameters of a proprietary model (Carlini et al., 2024).

Source: MIT AI Risk Repositorymit1271

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1271

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Enforce stringent Role-Based Access Control (RBAC) and the Principle of Least Privilege across the entire AI pipeline, specifically restricting access to proprietary training data and model weights. This is to be paired with robust Data Loss Prevention (DLP) solutions to monitor and block unauthorized transfers of sensitive data and intellectual property. 2. Deploy Model Watermarking and Prediction Perturbation techniques to defend against Model Extraction attacks. Watermarking allows for the post-factum verification of model ownership, while prediction perturbation (e.g., Deceptive Perturbation) introduces noise in API outputs to degrade the fidelity and utility of any illegally extracted surrogate model. 3. Implement continuous, real-time query and behavior monitoring using Anomaly Detection and Out-of-Distribution (OOD) methods. The objective is to automatically flag and rate-limit users exhibiting systematic, high-volume query patterns that deviate from benign usage, which is a hallmark of both Model Extraction and certain data exfiltration attempts.