Back to the MIT repository
1. Discrimination & Toxicity1 - Pre-deployment

Bias and discrimination (bias in training datasets)

AI experts consider training data to be the most salient source of bias in generative AI models. For example, GPT- 2’s training data comes from outbound links from Reddit, a social network often criticized for hosting anti-feminist content.351 As a result, AI models trained on such data are more likely to produce outputs that reflect these biases.

Source: MIT AI Risk Repositorymit736

ENTITY

3 - Other

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit736

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. Systematically curate and preprocess training datasets to ensure they are diverse, representative, and ethically sourced. This mandates rigorous data sanitization, source diversification, and balancing of demographic and social group representations using techniques such as data augmentation or reweighing to mitigate embedded societal biases. 2. Integrate fairness-aware algorithms and constraints directly into the model training process. This involves employing in-training techniques such as adversarial debiasing, regularization by adding a fairness term to the loss function, or model pruning to actively minimize the correlation between predictions and protected attributes. 3. Establish a robust governance framework that mandates continuous bias auditing and human oversight throughout the deployment lifecycle. This includes implementing a Human-in-the-Loop (HITL) mechanism with diverse reviewers to identify nuanced cultural or contextual biases that automated detection tools may fail to capture, ensuring ongoing model refinement.