Privacy - Model Extraction Attack (MEA)
replicating the parameters of the LLM,
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1511
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Differential Privacy and Output PerturbationDeploy model defenses that introduce controlled noise or perturbation to the prediction outputs, such as differential privacy mechanisms or randomized input/output defenses, to degrade the fidelity of the transfer set obtained via the black-box query interface, thereby substantially reducing the accuracy of any derived substitute model.2. Stringent Query and Access ControlsEnforce an integrated system of access control, including Role-Based Access Control (RBAC) and Multi-Factor Authentication (MFA), alongside the implementation of adaptive, low-threshold API rate limits and account limits to strictly constrain the volume and velocity of queries available to a single entity, mitigating the feasibility of data-intensive extraction attacks.3. Continuous Behavioral MonitoringEstablish advanced, real-time monitoring of model usage to detect and flag high-volume, systematic, or anomalous query patterns that exhibit the characteristic fingerprint of model extraction attempts, enabling rapid identification and preemptive mitigation of adversarial activity.