Compromising privacy or security by correctly inferring sensitive information
Anticipated risk: Privacy violations may occur at inference time even without an individual’s data being present in the training corpus. Insofar as LMs can be used to improve the accuracy of inferences on protected traits such as the sexual orientation, gender, or religiousness of the person providing the input prompt, they may facilitate the creation of detailed profiles of individuals comprising true and sensitive information without the knowledge or consent of the individual.
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit212
Domain lineage
2. Privacy & Security
2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Mitigation strategy
1. Implement inference-time constrained decoding with logit masking, leveraging regex-aware or advanced pattern detection over a rolling window of generated text to prevent the token-level generation of sensitive or personally identifiable information (PII). This mechanism provides provable prevention guarantees by blocking the output of patterns associated with sensitive data. 2. Enforce a robust privacy-preserving training regime, such as Differential Privacy (DP), to limit the model's capacity to memorize and, crucially, infer individual attributes from input text, thereby minimizing the privacy loss parameter epsilon ($\\epsilon$) for stronger guarantees against attribute inference attacks. 3. Deploy a multi-layered post-deployment defense framework incorporating both advanced input sanitization and output filtering. This includes applying token-level redaction on user input to remove contextual clues that could facilitate inference, and utilizing entropy-based or pattern-matching content filtering on the model's final response to suppress the inadvertent disclosure of sensitive, low-entropy sequences.
ADDITIONAL EVIDENCE
Example: Language utterances (e.g. Tweets) are already being analysed to predict private information such as political ori- entation [121, 144], age [131, 135], and health data such as addiction relapses [63].