Privacy - Membership Inference Attack (MIA)
inferring whether a given text record is used for training LLM
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1507
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Employ Differentially Private Stochastic Gradient Descent (DP-SGD) during LLM fine-tuning, specifically utilizing gradient clipping and noise injection to provide rigorous, mathematical guarantees against membership leakage. Preference should be given to User-level Differential Privacy to ensure uniform protection across variable user contributions. 2. Implement advanced architectural and training modifications, such as ensemble methods (e.g., Split-AI) or adaptive mixup techniques (e.g., AdaMixup), to minimize the model's generalization gap and enforce similar output behavior between training members and non-members, thereby disrupting the core signal exploited by MIAs. 3. Utilize privacy-preserving data preprocessing methods, including the generation of Differentially Private synthetic data or the application of generative diffusion models (e.g., D3P), to transform sensitive inputs and reduce exploitable fine-grained statistical characteristics prior to model training.