1. Discrimination & Toxicity2 - Post-deployment

Lower performance for some languages and social groups

LMs perform less well in some languages (Joshi et al., 2021; Ruder, 2020)...LM that more accurately captures the language use of one group, compared to another, may result in lower-quality language technologies for the latter. Disadvantaging users based on such traits may be particularly pernicious because attributes such as social class or education background are not typically covered as ‘protected characteristics’ in anti-discrimination law.

Source: MIT AI Risk Repositorymit235

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit235

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.3 > Unequal performance across groups

Mitigation strategy

1. Systematic Data Curation and Augmentation: Implement rigorous data auditing to identify and remediate biases, specifically targeting the collection of diverse, high-quality training data that is representative of different languages, demographics, and social groups. Utilize counterfactual data augmentation techniques to explicitly reduce stereotypical associations and improve the model's capacity to generalize equitably across all intended user populations. 2. Fairness-Aware Model Optimization and Fine-Tuning: Employ advanced bias mitigation techniques during the model development lifecycle, such as adjusting the loss function with fairness-aware penalties (e.g., MinDiff or Counterfactual Logit Pairing) to directly counteract imbalances in performance tied to sensitive attributes. Alternatively, fine-tune the model using debiasing methods like the Bias Vector approach to neutralize learned societal stereotypes and improve cross-group performance equity. 3. Continuous Performance Monitoring and Human-in-the-Loop Safeguards: Establish a continuous monitoring framework to track and log performance metrics and discriminatory outputs across different linguistic and social segments post-deployment. For high-risk applications, integrate Human-in-the-Loop (HIL) oversight to intercede in decision-making processes that may disproportionately affect marginalized groups, complementing technical controls with governance and human judgment.

ADDITIONAL EVIDENCE

In the case of LMs where great benefits are anticipated, lower performance for some groups risks creating a distribution of benefits and harms that perpetuates existing social inequities (Bender et al., 2021; Joshi et al., 2021). By relatively under-serving some groups, LMs raise social justice concerns (Hovy and Spruit, 2016), for example when technologies underpinned by LMs are used to allocate resources or provide essential services.