Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Representational Harms

beliefs about different social groups that reproduce unjust societal hierarchies

Source: MIT AI Risk Repositorymit133

ENTITY

3 - Other

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit133

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. Establish a rigorous **measurement and mitigation praxis** that expands beyond purely behavioral observation to quantify the cognitive, psychological, and social effects of algorithmic outputs, ensuring that mitigation efforts address *how* social groups are represented (e.g., stereotyping, demeaning) rather than solely focusing on *who* is represented. Pre-processing techniques, such as collecting diverse, inclusive datasets and transforming features to reduce correlation with sensitive attributes, should be prioritized. 2. Implement **in-processing bias-aware algorithms** and fairness constraints during model training to actively counteract the perpetuation of unjust societal hierarchies. This includes employing techniques like fair representation learning to develop data representations that are invariant to sensitive attributes, thereby preventing the model from systematically encoding discriminatory patterns. 3. Mandate **continuous algorithmic audits and impact assessments** in the post-deployment phase to track the emergence of representational harms and stereotypes, especially those arising from non-gendered or non-explicitly-group-related prompts. Utilize explainable AI (XAI) and post-processing techniques to identify, diagnose, and swiftly rectify biased outputs to minimize cumulative reputational and societal harm.