Lack of training data transparency
Without accurate documentation on how a model's data was collected, curated, and used to train a model, it might be harder to satisfactorily explain the behavior of the model with respect to the data.
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
1 - Pre-deployment
Risk ID
mit1272
Domain lineage
6. Socioeconomic and Environmental
6.5 > Governance failure
Mitigation strategy
1. Mandate the adoption of standardized training data documentation frameworks—such as Data Sheets or Data Statements—to record the complete data lifecycle, including collection methodology, curation, pre-processing, composition, and intended uses. This is essential for rendering the model's behavior scrutable and enabling downstream explainability. 2. Establish an independent, continuous auditing mechanism to verify the accuracy and completeness of mandated training data disclosures and documentation. Furthermore, integrate data observability tools and MLOps practices to monitor, measure, and automate the remediation of data biases in real time, ensuring quality and reproducibility. 3. Develop and implement clear, accessible, and user-empowering digital consent and actionability protocols, providing individuals with the explicit right to opt-in or opt-out of having their data used for AI training, thereby addressing ethical concerns regarding digital consent and rightsholder enforcement.