Inference of private information
Finally, LLMs can in principle infer private information based on model inputs even if the relevant private information is not present in the training corpus (Weidinger et al., 2021). For example, an LLM may correctly infer sensitive characteristics such as race and gender from data contained in input prompts.
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit417
Domain lineage
2. Privacy & Security
2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Mitigation strategy
1. Implement Differentially Private Inference Employ Differential Privacy (DP) techniques, such as DP-Fusion, during the model inference stage to introduce controlled noise or bound output probabilities. This provides formal privacy guarantees that obscure whether specific input details contributed to the inference outcome, mitigating attribute and membership inference attacks. 2. Deploy Multi-Party Private Inference Architectures Utilize secure multi-party computation protocols, such as Cascade, that distribute the user's prompt across multiple non-colluding parties (e.g., using token-level sharding). This strategy ensures no single entity possesses sufficient input data or intermediate model states to reconstruct the sensitive information being inferred. 3. Enforce Context-Specific Input Sanitization and Data Minimization Apply strict data minimization principles by masking, tokenizing, or redacting sensitive but non-essential information from the user's input prompt prior to model processing. Furthermore, use real-time detection tools to identify and reformulate out-of-context sensitive disclosures within the prompt, thereby limiting the features available to the LLM for attribute inference.