Reidentification
Even with the removal or personal identifiable information (PII) and sensitive personal information (SPI) from data, it might be possible to identify persons due to correlations to other features available in the data.
ENTITY
3 - Other
INTENT
2 - Unintentional
TIMING
1 - Pre-deployment
Risk ID
mit1278
Domain lineage
2. Privacy & Security
2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Mitigation strategy
1. Implement a Comprehensive Contextual Risk Assessment and Data Minimization Framework: Establish a rigorous framework to proactively assess re-identification risk by evaluating the interaction of quasi-identifiers within the dataset, considering their potential linkage with external, publicly available data sources (e.g., census, voter registries). Prioritize data minimization by adhering to "Strategic Minimalism," ensuring that only the data elements essential for the analytical objective are collected and retained to intrinsically reduce the dataset's re-identification potential. 2. Employ Advanced Statistical De-Identification Techniques: Apply sophisticated statistical anonymization methods to disrupt the linkability of quasi-identifiers. This includes utilizing k-anonymity, l-diversity, or t-closeness to ensure that each record is indistinguishable from a defined cohort, thereby preventing direct inference. Furthermore, strategically apply Generalization (e.g., bucketing age or geographic regions) and Suppression (removing high-risk outliers or identifiers) to reduce the uniqueness of records while striving to preserve data utility. 3. Establish Strict Governance, Access Controls, and Contractual Protections: Layer technical anonymization with robust administrative and legal safeguards. Implement role-based access controls (RBAC) and, where applicable, utilize a restricted data enclave or "Controlled Tier" to limit the pool of users and monitor data access patterns. For any data sharing, enforce formal Data Use Agreements (DUAs) that explicitly prohibit re-identification attempts and stipulate regular risk reassessments to account for evolving external data landscapes.