2. Privacy & Security2 - Post-deployment

Attribute inference attack

An attribute inference attack repeatedly queries a model to detect whether certain sensitive features can be inferred about individuals who participated in training a model. These attacks occur when an adversary has some prior knowledge about the training data and uses that knowledge to infer the sensitive data.

Source: MIT AI Risk Repositorymit1293

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1293

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement advanced adversarial defense mechanisms (e.g., score masking or data perturbation) to inject controlled noise into model outputs or public features, minimizing the adversary's attribute inference accuracy while preserving model utility. 2. Deploy attribute unlearning frameworks to eliminate the representation of sensitive attributes from the model's learned parameters or user embeddings, targeting attribute leakage post-training or in-training. 3. Apply foundational privacy-enhancing technologies (PETs), such as Differential Privacy during training or rigorous removal of sensitive attributes from the training data, to reduce the initial opportunity for attribute leakage.