Risks from data (Risks of unregulated training data annotation)
Issues with training data annotation, such as incomplete annotation guidelines, incapable annotators, and errors in annotation, can affect the accuracy, reliability, and effectiveness of models and algorithms. Moreover, they can introduce training biases, amplify discrimination, reduce generalization abilities, and result in incorrect outputs.
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
1 - Pre-deployment
Risk ID
mit689
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Develop and enforce highly detailed, unambiguous annotation guidelines and protocols, including explicit handling of edge cases, visual examples, and a formal version control system, to minimize ambiguity and ensure consistency across all data points. 2. Institute a comprehensive annotator training and calibration program, focusing on domain-specific knowledge, the rationale behind the guidelines, and continuous bias awareness, to ensure a shared understanding and high initial labeling accuracy. 3. Implement a rigorous, multi-level Quality Assurance (QA) framework utilizing systematic checks such as Inter-Annotator Agreement (IAA) metrics and Gold Standard datasets, with a continuous feedback mechanism to drive iterative refinement of guidelines and annotator performance.