Privacy - Data Extraction Attack (DEA)
extracting the text records that exist in the training dataset
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1508
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement **Differential Privacy (DP)** during the model training process to introduce controlled noise, thereby statistically bounding the influence of individual data points and mitigating the risk of specific record extraction. 2. Employ comprehensive **Data Sanitization and Anonymization** techniques, such as **k-anonymity** or **data masking**, prior to training to ensure that any remaining sensitive or Personally Identifiable Information (PII) in the training corpus is de-identified and non-unique. 3. Enforce strict **Query-Based Access Controls and Rate Limiting** on the deployed model's API to restrict the volume of consecutive queries, which is a prerequisite for adversaries to execute iterative or composite data extraction attacks.