Incomplete or biased training data
Incomplete or biased training data can lead to discriminatory AI outputs.
ENTITY
3 - Other
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1072
Domain lineage
1. Discrimination & Toxicity
1.1 > Unfair discrimination and misrepresentation
Mitigation strategy
1. Prioritize upstream mitigation through rigorous data governance: Implement mandatory protocols for ensuring training datasets are highly representative and diverse across all relevant socio-demographic and contextual attributes, utilizing Exploratory Data Analysis (EDA) to preemptively identify and document potential sources of sampling and historical bias. 2. Employ data preprocessing fairness interventions: Systematically apply techniques such as data rebalancing (e.g., augmentation, undersampling, or synthetic data generation) to correct identified data imbalances and implement data cleaning procedures to eliminate erroneous or misleading outliers, thus creating a more equitable foundation for model training. 3. Institute a continuous audit and oversight mechanism: Establish a robust AI governance framework requiring the use of algorithmic fairness tools and regular, independent bias audits to monitor model performance across different subgroups during training and post-deployment, ensuring ongoing compliance with established fairness constraints and providing human-in-the-loop (HITL) intervention for critical decisions.