1. Discrimination & Toxicity2 - Post-deployment

Discrimination

This is the risk of an ML system encoding stereotypes of or performing disproportionately poorly for some demographics/social groups.

Source: MIT AI Risk Repositorymit199

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit199

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. **Prioritize the collection and pre-processing of diverse and representative training data.** This strategy involves implementing rigorous data audits and regular reviews to identify and correct imbalances, selection bias, and historical bias, ensuring sufficient representation across all relevant demographic and social subgroups (Sources 6, 7, 8, 15, 19). 2. **Integrate fairness-by-design through in-processing techniques and comprehensive evaluation.** This includes incorporating algorithmic fairness constraints (e.g., reweighting, adversarial debiasing) during model training and conducting mandatory intersectional performance testing across subgroups to detect and quantify disparate error rates (Sources 6, 7, 13, 14, 19). 3. **Establish robust governance frameworks that mandate human oversight and accountability in deployment.** Ensure that human reviewers are 'in the loop' for high-stakes decisions and utilize post-processing methods to adjust model outputs, recalibrating predictions to meet fairness criteria and mitigate allocational harm where residual bias is detected (Sources 6, 8, 16).

ADDITIONAL EVIDENCE

ML systems gatekeeping access to economic opportunity, privacy, and liberty run the risk of discriminating against minority demographics if they perform disproportionately poorly for them. This is known as “allocational harm”. Another form of discrimination is the encoding of demographic-specific stereotypes and is a form of “representational harm” [43]. The Gender Shades study highlighted performance disparities between demographics14 Samson Tan, Araz Taeihagh, and Kathy Baxter in computer vision [28] while Bolukbasi et al. discovered gender stereotypes encoded in word embeddings [18]. Recent reporting has also exposed gender and racially-aligned discrimination in ML systems used for recruiting [45], education [65], automatic translation [86], and immigration [149]. We focus on how discrimination risk can result from first-order risks and refer the reader to comprehensive surveys for discussions on the biases in ML algorithms [17, 94, 124, 161, 172]. There are various ways in which first-order risks can give rise to discrimination risk. For example, facial recognition systems may be misused by law enforcement, using celebrity photos or composites in place of real photos of the suspect [76]. This leads to discrimination when coupled with performance disparities between majority and minority demographics [28]. Such disparities may stem from misrepresentative training data and a lack of mitigating mechanisms [161]. Insufficient testing and a non-diverse team may also cause such disparities to pass unnoticed into production [59, 142]. Finally, even something as fundamental as an argmax function may result in biased image crops [198]