Back to the MIT repository
7. AI System Safety, Failures, & Limitations1 - Pre-deployment

Improper data curation

Improper collection and preparation of training or tuning data includes data label errors and by using data with conflicting information or misinformation.

Source: MIT AI Risk Repositorymit1284

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit1284

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Implement rigorous, multi-stage data validation and cleaning protocols to identify and rectify structural errors, inconsistencies, and duplicates within the dataset prior to model training. 2. Establish and enforce standardized data labeling guidelines and employ quantitative quality assurance mechanisms, such as inter-annotator agreement (IAA) checks, to minimize human-introduced label errors and subjectivity. 3. Develop a comprehensive data governance framework that mandates clear data ownership, standardized metadata creation to track data lineage and provenance, and regular automated audits to ensure data integrity and compliance with predefined quality standards.