Privacy - Attribute Inference Attack (AIA)
deducing the private or sensitive information from training texts, prompting texts or external texts
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1510
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
Employ Attribute Unlearning or Adversarial Representation Learning to eliminate sensitive attributes from model embeddings and representations during or after training, thus minimizing intrinsic information leakage. Implement score masking strategies, such as injecting targeted adversarial noise into the prediction score vector or restricting the precision of model confidence scores, to actively confound the adversary's attribute classifier. Apply data perturbation techniques, including adding minimum-magnitude adversarial noise to a user's public data or leveraging user-level differential privacy, to reduce the correlation between public data inputs and the sensitive attribute targets.