Back to the MIT repository
6. Socioeconomic and Environmental1 - Pre-deployment

Uncertain data provenance

Data provenance refers to tracing history of data, which includes its ownership, origin, and transformations. Without standardized and established methods for verifying where the data came from, there are no guarantees that the data is the same as the original source and has the correct usage terms.

Source: MIT AI Risk Repositorymit1273

ENTITY

1 - Human

INTENT

3 - Other

TIMING

1 - Pre-deployment

Risk ID

mit1273

Domain lineage

6. Socioeconomic and Environmental

262 mapped risks

6.5 > Governance failure

Mitigation strategy

1. Implement and enforce standardized, cryptographically-secured data provenance systems (e.g., utilizing blockchain or hashing) to immutably record the entire data lifecycle, including origin, transformations, and associated legal/usage rights. 2. Mandate comprehensive transparency through standardized documentation (e.g., Data Provenance Cards or Dataset Cards) detailing dataset sources, collection methods, composition, and licensing to facilitate external and internal risk assessment for bias, quality, and copyright issues. 3. Integrate data provenance requirements directly into organizational AI governance and risk management frameworks, ensuring that accountability and auditability are established for data used in model training, in line with emerging global regulations.