Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Personal information in data

Inclusion or presence of personal identifiable information (PII) and sensitive personal information (SPI) in the data used for training or fine tuning the model might result in unwanted disclosure of that information.

Source: MIT AI Risk Repositorymit1277

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1277

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Establish Data Minimization and Privacy-by-Design Principles: Mandate the collection and retention of only the personally identifiable information (PII) and sensitive personal information (SPI) strictly necessary for the model's function. This involves leveraging feature selection, temporal retention policies, and upfront Privacy Impact Assessments (PIAs) to ensure privacy is the default state prior to data ingestion. 2. Employ Advanced De-identification Techniques: Systematically apply data sanitization, masking, and pseudonymization (e.g., using Named Entity Recognition, hashing, or synthetic data replacement) to all training and fine-tuning datasets to transform or remove direct and quasi-identifiers before they are introduced into the model training pipeline. 3. Integrate Privacy-Enhancing Technologies (PETs): Utilize Differential Privacy (DP) mechanisms during the model's learning phase to introduce controlled, quantifiable noise into the dataset or model outputs. This provides formal, mathematical guarantees that the resulting model cannot be used to infer individual data points, thereby mitigating model inversion and data extraction risks.