1. Discrimination & Toxicity2 - Post-deployment

Data bias

Historical and societal biases that are present in the data are used to train and fine-tune the model.

Source: MIT AI Risk Repositorymit1279

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1279

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. Proactively source and curate diverse and representative training data from multiple, high-quality sources, prioritizing the identification and correction of under-representation or differential data quality issues across all protected groups. 2. Employ pre-processing techniques such as data reweighting, up/down sampling, or synthetic data generation (e.g., SMOTE, GANs) to modify the distribution of training instances, thereby ensuring greater balance and reducing the disproportionate influence of historically biased data patterns. 3. Utilize fair representation learning algorithms to transform the training data into a latent representation that minimizes the encoding of information related to sensitive or protected attributes, thus mitigating the transfer of historical bias into the trained model. 4. Implement rigorous data governance and auditing protocols, including feature blinding (masking or filtering sensitive attributes) during model training, to proactively monitor and restrict the model's reliance on features that serve as proxies for systemic bias.