Attribute inference attack
An attribute inference attack repeatedly queries a model to detect whether certain sensitive features can be inferred about individuals who participated in training a model. These attacks occur when an adversary has some prior knowledge about the training data and uses that knowledge to infer the sensitive data.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1293
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement advanced adversarial defense mechanisms (e.g., score masking or data perturbation) to inject controlled noise into model outputs or public features, minimizing the adversary's attribute inference accuracy while preserving model utility. 2. Deploy attribute unlearning frameworks to eliminate the representation of sensitive attributes from the model's learned parameters or user embeddings, targeting attribute leakage post-training or in-training. 3. Apply foundational privacy-enhancing technologies (PETs), such as Differential Privacy during training or rigorous removal of sensitive attributes from the training data, to reduce the initial opportunity for attribute leakage.