Amplification of biases
Current Frontier AI mdoels amplify existing biases within their training data and can be manipulated into providing potentially harmful responses, for example abusive language or discriminatory responses91,92. This is not limited to text generation but can be seen across all modalities of generative AI93. Training on large swathes of UK and US English internet content can mean that misogynistic, ageist, and white supremacist content is overrepresented in the training data94.
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
1 - Pre-deployment
Risk ID
mit911
Domain lineage
1. Discrimination & Toxicity
1.1 > Unfair discrimination and misrepresentation
Mitigation strategy
1. Prioritize the curation of diverse and representative training data, employing techniques such as multi-source collection, reweighting/resampling of underrepresented groups, and the removal or anonymization of sensitive information to mitigate the amplification of societal biases. 2. Establish a robust AI Governance framework that mandates regular, independent bias audits and impact assessments across the entire AI lifecycle (pre-deployment through post-deployment monitoring) to ensure continuous fairness, accountability, and transparency. 3. Integrate fairness-aware machine learning algorithms (e.g., adversarial debiasing or fair representation learning) and explicit fairness constraints into the model training process to technically minimize the propagation of discriminatory patterns learned from the training data.