Back to the MIT repository
2. Privacy & Security1 - Pre-deployment

Reidentification

Even with the removal or personal identifiable information (PII) and sensitive personal information (SPI) from data, it might be possible to identify persons due to correlations to other features available in the data.

Source: MIT AI Risk Repositorymit1278

ENTITY

3 - Other

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit1278

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Implement a Comprehensive Contextual Risk Assessment and Data Minimization Framework: Establish a rigorous framework to proactively assess re-identification risk by evaluating the interaction of quasi-identifiers within the dataset, considering their potential linkage with external, publicly available data sources (e.g., census, voter registries). Prioritize data minimization by adhering to "Strategic Minimalism," ensuring that only the data elements essential for the analytical objective are collected and retained to intrinsically reduce the dataset's re-identification potential. 2. Employ Advanced Statistical De-Identification Techniques: Apply sophisticated statistical anonymization methods to disrupt the linkability of quasi-identifiers. This includes utilizing k-anonymity, l-diversity, or t-closeness to ensure that each record is indistinguishable from a defined cohort, thereby preventing direct inference. Furthermore, strategically apply Generalization (e.g., bucketing age or geographic regions) and Suppression (removing high-risk outliers or identifiers) to reduce the uniqueness of records while striving to preserve data utility. 3. Establish Strict Governance, Access Controls, and Contractual Protections: Layer technical anonymization with robust administrative and legal safeguards. Implement role-based access controls (RBAC) and, where applicable, utilize a restricted data enclave or "Controlled Tier" to limit the pool of users and monitor data access patterns. For any data sharing, enforce formal Data Use Agreements (DUAs) that explicitly prohibit re-identification attempts and stipulate regular risk reassessments to account for evolving external data landscapes.