1. Discrimination & Toxicity2 - Post-deployment

Stereotyping social groups

Stereotyping in an algorithmic system refers to how the system’s outputs reflect “beliefs about the characteristics, attributes, and behaviors of members of certain groups....and about how and why certain attributes go together

Source: MIT AI Risk Repositorymit134

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit134

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. Implement continuous post-deployment auditing and monitoring of the algorithmic system's outputs, utilizing a comprehensive suite of fairness metrics (e.g., demographic parity, equalized odds) to detect and quantify stereotype-associated errors and bias drift across diverse and intersectional social groups. This process must incorporate robust feedback mechanisms to integrate real-world evidence and experiential harms reported by affected users. 2. Employ output-correction and inference-stage mitigation strategies, such as post-processing adjustments (e.g., the Randomized Threshold Optimizer or Calibrated Equalized Odds) or simple model prompting techniques, to immediately re-weight or modify predictions/generations found to reinforce or violate stereotypes, thereby reducing differential treatment and misrepresentation in the final system output. 3. Establish a mandatory human-in-the-loop governance structure that requires human oversight to review, validate, and override high-stakes AI-assisted decisions, coupled with a commitment to organizational transparency that clearly documents the AI's intended use, known bias limitations, and the results of all applied mitigation strategies.

ADDITIONAL EVIDENCE

Exclusionary norms [in language models] can manifest in 'subtle patterns' like referring to women doctors as if doctor itself entails not-woman