7. AI System Safety, Failures, & Limitations1 - Pre-deployment

Incorrect data labels

Data labels are essential for any supervised learning algorithm since they preset the result of the learning process. If the correctness of the data labels is not given, the AI system is prevented from learning the ground truth and therefore the intended functionality.

Source: MIT AI Risk Repositorymit1003

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit1003

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Implement a Multi-Layered Data Governance and Quality Assurance Framework Establish comprehensive and unambiguous annotation ontologies, mandate inter-rater reliability analysis (consensus scoring) during the labeling process, and integrate continuous data validation by domain experts. This preventative measure ensures high-fidelity "ground truth" and minimizes the introduction of label noise at the source. 2. Employ Noise-Tolerant Training Methodologies Utilize machine learning techniques inherently robust to label noise, such as generalized or noise-tolerant loss functions (e.g., Generalized Cross Entropy or Mean Absolute Error), and regularization strategies like label smoothing. These methods reduce the model's capacity to memorize incorrect labels, thereby enhancing generalization performance. 3. Integrate Automated Label Error Detection and Correction Systems Deploy model-based techniques, such as Confident Learning or analysis of loss dynamics, to systematically identify training instances with high likelihood of being mislabeled. These flagged samples must then be routed for targeted human expert review and subsequent iterative correction (label cleaning) to maintain dataset integrity throughout the model lifecycle.