Membership inference attack
A membership inference attack repeatedly queries a model to determine whether a given input was part of the model’s training. More specifically, given a trained model and a data sample, an attacker samples the input space, observing outputs to deduce whether that sample was part of the model's training.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1292
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement Differential Privacy (DP) to provide a formal and mathematically rigorous defense. This guarantees that the influence of any single training record on the model's output distribution is bounded, thus providing provable privacy against MIAs, although it may incur a trade-off with model utility. 2. Employ Structural Ensemble Architectures with Adaptive Inference, such as the Split-AI component of the SELENA framework. This involves training multiple sub-models on random subsets of the training data and using an adaptive test-time strategy that only aggregates outputs from sub-models that did not use the queried sample in their training set. This is a structural approach to enforce similar model behavior on member and non-member inputs. 3. Apply Privacy-Preserving Data Preprocessing or Data Synthesis Techniques. This involves transforming the sensitive training data before model learning, for example, using generative models (like in Diffusion-Driven Data Preprocessing or D3P) or rigorous pseudonymization/anonymization to alter the fine-grained statistical characteristics of the input and reduce exploitable membership signals.