Improper data curation
Improper collection and preparation of training or tuning data includes data label errors and by using data with conflicting information or misinformation.
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
1 - Pre-deployment
Risk ID
mit1284
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Implement rigorous, multi-stage data validation and cleaning protocols to identify and rectify structural errors, inconsistencies, and duplicates within the dataset prior to model training. 2. Establish and enforce standardized data labeling guidelines and employ quantitative quality assurance mechanisms, such as inter-annotator agreement (IAA) checks, to minimize human-introduced label errors and subjectivity. 3. Develop a comprehensive data governance framework that mandates clear data ownership, standardized metadata creation to track data lineage and provenance, and regular automated audits to ensure data integrity and compliance with predefined quality standards.