2. Privacy & Security2 - Post-deployment

Privacy - Attribute Inference Attack (AIA)

deducing the private or sensitive information from training texts, prompting texts or external texts

Source: MIT AI Risk Repositorymit1510

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1510

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

Employ Attribute Unlearning or Adversarial Representation Learning to eliminate sensitive attributes from model embeddings and representations during or after training, thus minimizing intrinsic information leakage. Implement score masking strategies, such as injecting targeted adversarial noise into the prediction score vector or restricting the precision of model confidence scores, to actively confound the adversary's attribute classifier. Apply data perturbation techniques, including adding minimum-magnitude adversarial noise to a user's public data or leveraging user-level differential privacy, to reduce the correlation between public data inputs and the sensitive attribute targets.