1. Discrimination & Toxicity3 - Other

Bias, Fairness and Representational Harms

Frontier AI models can contain and magnify biases ingrained in the data they are trained on, reflecting societal and historical inequalities and stereotypes.177 These biases, often subtle and deeply embedded, compromise the equitable and ethical use of AI systems, making it difficult for AI to improve fairness in decisions.178 Removing attributes like race and gender from training data has generally proven ineffective as a remedy for algorithmic bias, as models can infer these attributes from other information such as names, locations, and other seemingly unrelated factors.

Source: MIT AI Risk Repositorymit1378

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

3 - Other

Risk ID

mit1378

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. Proactively curate diverse and representative training datasets using pre-processing techniques, such as reweighting and resampling of underrepresented groups, to mitigate foundational data bias and eliminate proxy variables that inadvertently correlate with protected attributes. 2. Employ in-processing, fairness-aware algorithms, including adversarial debiasing and fair regularization techniques, to explicitly constrain the model's loss function and minimize the propagation of bias during the training phase. 3. Establish a robust governance framework requiring continuous auditing and monitoring of the deployed AI system, utilizing statistical fairness metrics and human oversight to detect emergent bias drift and ensure accountability for equitable outcomes across all demographic groups.