Model extraction
Data Exfiltration goes beyond revealing private information, and involves illicitly obtaining the training data used to build a model that may be sensitive or proprietary. Model Extraction is the same attack, only directed at the model instead of the training data — it involves obtaining the architecture, parameters, or hyper-parameters of a proprietary model (Carlini et al., 2024).
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1266
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement robust API access controls and anomaly detection, including stringent rate limiting and continuous monitoring of query patterns, to promptly identify and interrupt extraction attempts. 2. Employ Model Hardening techniques such as watermarking the model's parameters or output, and introducing controlled noise or differential privacy to prediction results, thereby degrading the utility and fidelity of any derived substitute model. 3. Utilize Secure Execution Environments, such as Trusted Execution Environments (TEEs) or secure enclaves, to isolate the model's core architecture and parameters during inference, providing a structural defense against unauthorized extraction and direct parameter access.