1. Discrimination & Toxicity1 - Pre-deployment

Amplification of biases

Current Frontier AI mdoels amplify existing biases within their training data and can be manipulated into providing potentially harmful responses, for example abusive language or discriminatory responses91,92. This is not limited to text generation but can be seen across all modalities of generative AI93. Training on large swathes of UK and US English internet content can mean that misogynistic, ageist, and white supremacist content is overrepresented in the training data94.

Source: MIT AI Risk Repositorymit911

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit911

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. Prioritize the curation of diverse and representative training data, employing techniques such as multi-source collection, reweighting/resampling of underrepresented groups, and the removal or anonymization of sensitive information to mitigate the amplification of societal biases. 2. Establish a robust AI Governance framework that mandates regular, independent bias audits and impact assessments across the entire AI lifecycle (pre-deployment through post-deployment monitoring) to ensure continuous fairness, accountability, and transparency. 3. Integrate fairness-aware machine learning algorithms (e.g., adversarial debiasing or fair representation learning) and explicit fairness constraints into the model training process to technically minimize the propagation of discriminatory patterns learned from the training data.