Data exfiltration
Data Exfiltration goes beyond revealing private information, and involves illicitly obtaining the training data used to build a model that may be sensitive or proprietary. Model Extraction is the same attack, only directed at the model instead of the training data — it involves obtaining the architecture, parameters, or hyper-parameters of a proprietary model (Carlini et al., 2024).
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1271
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Enforce stringent Role-Based Access Control (RBAC) and the Principle of Least Privilege across the entire AI pipeline, specifically restricting access to proprietary training data and model weights. This is to be paired with robust Data Loss Prevention (DLP) solutions to monitor and block unauthorized transfers of sensitive data and intellectual property. 2. Deploy Model Watermarking and Prediction Perturbation techniques to defend against Model Extraction attacks. Watermarking allows for the post-factum verification of model ownership, while prediction perturbation (e.g., Deceptive Perturbation) introduces noise in API outputs to degrade the fidelity and utility of any illegally extracted surrogate model. 3. Implement continuous, real-time query and behavior monitoring using Anomaly Detection and Out-of-Distribution (OOD) methods. The objective is to automatically flag and rate-limit users exhibiting systematic, high-volume query patterns that deviate from benign usage, which is a hallmark of both Model Extraction and certain data exfiltration attempts.