Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Privacy - Data Extraction Attack (DEA)

extracting the text records that exist in the training dataset

Source: MIT AI Risk Repositorymit1508

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1508

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement **Differential Privacy (DP)** during the model training process to introduce controlled noise, thereby statistically bounding the influence of individual data points and mitigating the risk of specific record extraction. 2. Employ comprehensive **Data Sanitization and Anonymization** techniques, such as **k-anonymity** or **data masking**, prior to training to ensure that any remaining sensitive or Personally Identifiable Information (PII) in the training corpus is de-identified and non-unique. 3. Enforce strict **Query-Based Access Controls and Rate Limiting** on the deployed model's API to restrict the volume of consecutive queries, which is a prerequisite for adversaries to execute iterative or composite data extraction attacks.