Bias, Stereotypes, and Representational Harms
Generative AI systems can embed and amplify harmful biases that are most detrimental to marginalized peoples.
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
3 - Other
Risk ID
mit167
Domain lineage
1. Discrimination & Toxicity
1.1 > Unfair discrimination and misrepresentation
Mitigation strategy
1. **Pre-processing and Data Remediation:** Conduct rigorous Data Bias Assessments, including dataset profiling and computation of bias metrics, to identify and rectify underrepresentation or skewed historical patterns. Implement data augmentation techniques, such as data reweighing or creation of fairness-aware synthetic data, to ensure training corpora are diverse and representative across protected attributes. 2. **In-processing Algorithmic Constraint Implementation:** Integrate fairness constraints directly into the model optimization pipeline (in-processing) by employing bias-aware algorithms. Explicitly optimize the model for established fairness metrics, such as Equal Opportunity Difference or Statistical Parity Difference, to prevent the algorithm from prioritizing frequent patterns over equitable outcomes during training. 3. **Continuous Auditing and Governance:** Establish a systematic, automated Fairness as a Service (FaaS) or Continuous Monitoring and Feedback loop for post-deployment governance. This involves regular, prompt-based stress testing, formal bias audits against defined metrics, and the use of post-processing methods to adjust outputs for fairness, ensuring ongoing accountability and rapid correction of emergent bias drift.
ADDITIONAL EVIDENCE
Categories of bias, from system to human to statistical, interact with each other and are intertwined [211]. For bias evaluations that do not narrowly capture biases as they occur in Generative AI systems, it is necessary to consider work outside of the field of question. For instance, for natural language processing, bias evaluations must seriously engage with the relationship between the modality (i.e. language) and social hierarchies [33]. When thinking about representational harms [125], it is also important to consider the extent to which any representation could confer harm (see 4.2.2.2 Long-term Amplifying Marginalization by Exclusion (and Inclusion)). Although bias evaluations in data have been subject to a large body of research, bias is not only a “data problem.” Biases are not only introduced in the data pipeline but throughout the entire machine learning pipeline [237]. The overall level of harm is also impacted by modeling choice [108]. These can include choices about many stages of the optimization process [237, 129]; privacy constraints [24], widely used compression techniques [109, 15, 169] and the choice hardware [273] have all been found to amplify harm on underrepresented protected attributes [28]. The geographic location, demographic makeup, and team structures of researcher and developer organizations can also introduce biases.