Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Model extraction

Data Exfiltration goes beyond revealing private information, and involves illicitly obtaining the training data used to build a model that may be sensitive or proprietary. Model Extraction is the same attack, only directed at the model instead of the training data — it involves obtaining the architecture, parameters, or hyper-parameters of a proprietary model (Carlini et al., 2024).

Source: MIT AI Risk Repositorymit1266

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1266

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement robust API access controls and anomaly detection, including stringent rate limiting and continuous monitoring of query patterns, to promptly identify and interrupt extraction attempts. 2. Employ Model Hardening techniques such as watermarking the model's parameters or output, and introducing controlled noise or differential privacy to prediction results, thereby degrading the utility and fidelity of any derived substitute model. 3. Utilize Secure Execution Environments, such as Trusted Execution Environments (TEEs) or secure enclaves, to isolate the model's core architecture and parameters during inference, providing a structural defense against unauthorized extraction and direct parameter access.