2. Privacy & Security2 - Post-deployment

Privacy - Membership Inference Attack (MIA)

inferring whether a given text record is used for training LLM

Source: MIT AI Risk Repositorymit1507

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1507

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Employ Differentially Private Stochastic Gradient Descent (DP-SGD) during LLM fine-tuning, specifically utilizing gradient clipping and noise injection to provide rigorous, mathematical guarantees against membership leakage. Preference should be given to User-level Differential Privacy to ensure uniform protection across variable user contributions. 2. Implement advanced architectural and training modifications, such as ensemble methods (e.g., Split-AI) or adaptive mixup techniques (e.g., AdaMixup), to minimize the model's generalization gap and enforce similar output behavior between training members and non-members, thereby disrupting the core signal exploited by MIAs. 3. Utilize privacy-preserving data preprocessing methods, including the generation of Differentially Private synthetic data or the application of generative diffusion models (e.g., D3P), to transform sensitive inputs and reduce exploitable fine-grained statistical characteristics prior to model training.