Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Privacy - Model Extraction Attack (MEA)

replicating the parameters of the LLM,

Source: MIT AI Risk Repositorymit1511

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1511

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Differential Privacy and Output PerturbationDeploy model defenses that introduce controlled noise or perturbation to the prediction outputs, such as differential privacy mechanisms or randomized input/output defenses, to degrade the fidelity of the transfer set obtained via the black-box query interface, thereby substantially reducing the accuracy of any derived substitute model.2. Stringent Query and Access ControlsEnforce an integrated system of access control, including Role-Based Access Control (RBAC) and Multi-Factor Authentication (MFA), alongside the implementation of adaptive, low-threshold API rate limits and account limits to strictly constrain the volume and velocity of queries available to a single entity, mitigating the feasibility of data-intensive extraction attacks.3. Continuous Behavioral MonitoringEstablish advanced, real-time monitoring of model usage to detect and flag high-volume, systematic, or anomalous query patterns that exhibit the characteristic fingerprint of model extraction attempts, enabling rapid identification and preemptive mitigation of adversarial activity.