Back to the MIT repository
6. Socioeconomic and Environmental1 - Pre-deployment

Lack of training data transparency

Without accurate documentation on how a model's data was collected, curated, and used to train a model, it might be harder to satisfactorily explain the behavior of the model with respect to the data.

Source: MIT AI Risk Repositorymit1272

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit1272

Domain lineage

6. Socioeconomic and Environmental

262 mapped risks

6.5 > Governance failure

Mitigation strategy

1. Mandate the adoption of standardized training data documentation frameworks—such as Data Sheets or Data Statements—to record the complete data lifecycle, including collection methodology, curation, pre-processing, composition, and intended uses. This is essential for rendering the model's behavior scrutable and enabling downstream explainability. 2. Establish an independent, continuous auditing mechanism to verify the accuracy and completeness of mandated training data disclosures and documentation. Furthermore, integrate data observability tools and MLOps practices to monitor, measure, and automate the remediation of data biases in real time, ensuring quality and reproducibility. 3. Develop and implement clear, accessible, and user-empowering digital consent and actionability protocols, providing individuals with the explicit right to opt-in or opt-out of having their data used for AI training, thereby addressing ethical concerns regarding digital consent and rightsholder enforcement.