Back to the MIT repository
1. Discrimination & Toxicity3 - Other

Data Issues

Data heterogeneity, data insufficiency, imbalanced data, untrusted data, biased data, and data uncertainty are other data issues that may cause various difficulties in datadriven machine learning algorithms.. Bias is a human feature that may affect data gathering and labeling. Sometimes, bias is present in historical, cultural, or geographical data. Consequently, bias may lead to biased models which can provide inappropriate analysis. Despite being aware of the existence of bias, avoiding biased models is a challenging task

Source: MIT AI Risk Repositorymit593

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

3 - Other

Risk ID

mit593

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. **Systematic Data Governance for Representational Equity** * **Action:** Implement robust data governance frameworks to ensure training datasets are demonstrably diverse, representative, and collected with standardized, bias-reducing methodologies. This mandates actively collecting data across a comprehensive spectrum of demographic, geographical, and historical contexts to eliminate selection and historical bias. Where data imbalance is observed, employ pre-processing techniques such as oversampling, reweighting of underrepresented groups, or generating synthetic data points to achieve an equitable distribution prior to model training. 2. **Integration of Fairness-Aware Machine Learning Algorithms** * **Action:** Incorporate in-processing, fairness-aware machine learning techniques directly into the model training pipeline. This involves using methods like adversarial debiasing to minimize the reliance on sensitive attributes, or fair representation learning to transform input data into a latent space where discriminatory factors are suppressed. Fairness constraints must be formally introduced into the model's optimization objective or loss function to explicitly promote balanced predictive outcomes across specified groups. 3. **Continuous Auditing and Bias Drift Monitoring** * **Action:** Establish a mandatory, continuous auditing mechanism throughout the entire AI lifecycle, extending into post-deployment operations. This mechanism must employ statistical fairness metrics (e.g., disparate impact, equalized odds) to measure model performance discrepancies across demographic subgroups. Regular, automated audits are required to detect bias drift—the phenomenon where model fairness degrades over time due to shifts in real-world usage or data distribution—and trigger a mandatory retraining and recalibration process with fresh, rebalanced data.