Personal information in data
Inclusion or presence of personal identifiable information (PII) and sensitive personal information (SPI) in the data used for training or fine tuning the model might result in unwanted disclosure of that information.
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1277
Domain lineage
2. Privacy & Security
2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Mitigation strategy
1. Establish Data Minimization and Privacy-by-Design Principles: Mandate the collection and retention of only the personally identifiable information (PII) and sensitive personal information (SPI) strictly necessary for the model's function. This involves leveraging feature selection, temporal retention policies, and upfront Privacy Impact Assessments (PIAs) to ensure privacy is the default state prior to data ingestion. 2. Employ Advanced De-identification Techniques: Systematically apply data sanitization, masking, and pseudonymization (e.g., using Named Entity Recognition, hashing, or synthetic data replacement) to all training and fine-tuning datasets to transform or remove direct and quasi-identifiers before they are introduced into the model training pipeline. 3. Integrate Privacy-Enhancing Technologies (PETs): Utilize Differential Privacy (DP) mechanisms during the model's learning phase to introduce controlled, quantifiable noise into the dataset or model outputs. This provides formal, mathematical guarantees that the resulting model cannot be used to infer individual data points, thereby mitigating model inversion and data extraction risks.