Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Compromising privacy by correctly inferring private information

Privacy violations may occur at the time of inference even without the individual’s private data being present in the training dataset. Similar to other statistical models, a LM may make correct inferences about a person purely based on correlational data about other people, and without access to information that may be private about the particular individual. Such correct inferences may occur as LMs attempt to predict a person’s gender, race, sexual orientation, income, or religion based on user input.

Source: MIT AI Risk Repositorymit238

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit238

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Implement Differential Privacy (DP) mechanisms during model training to limit the statistical characteristics that enable strong, specific inferences. DP ensures that the inclusion or exclusion of any individual’s data does not significantly alter the model’s resulting output distribution by introducing mathematically quantifiable noise, thereby mitigating the risk of profile construction based on learned correlations. 2. Deploy inference-time defense modules and contextual integrity frameworks to validate and gate the appropriateness of all information flow. These systems should actively classify and restrict the disclosure of sensitive attributes inferred by the model, ensuring the AI agent adheres to established privacy norms before releasing potentially harmful information. 3. Enforce rigorous, advanced data sanitization, including the redaction and pseudonymization of all sensitive personal information within both the training datasets and the user input streams. This preprocessing must be coupled with robust output filtering to actively prevent the model from generating or disclosing inferred sensitive attributes or profiles.

ADDITIONAL EVIDENCE

here such systems are relied upon by institutions that wield power - e.g. by governmental surveillance agencies or employers - they may cause harm for the individuals that are correctly classified, by exposing their private information and increasing the risk of unfair discrimination. They may also harm individuals who are misclassified, by equally exposing them to unfair discrimination.