Uncertain data provenance
Data provenance refers to tracing history of data, which includes its ownership, origin, and transformations. Without standardized and established methods for verifying where the data came from, there are no guarantees that the data is the same as the original source and has the correct usage terms.
ENTITY
1 - Human
INTENT
3 - Other
TIMING
1 - Pre-deployment
Risk ID
mit1273
Domain lineage
6. Socioeconomic and Environmental
6.5 > Governance failure
Mitigation strategy
1. Implement and enforce standardized, cryptographically-secured data provenance systems (e.g., utilizing blockchain or hashing) to immutably record the entire data lifecycle, including origin, transformations, and associated legal/usage rights. 2. Mandate comprehensive transparency through standardized documentation (e.g., Data Provenance Cards or Dataset Cards) detailing dataset sources, collection methods, composition, and licensing to facilitate external and internal risk assessment for bias, quality, and copyright issues. 3. Integrate data provenance requirements directly into organizational AI governance and risk management frameworks, ensuring that accountability and auditability are established for data used in model training, in line with emerging global regulations.