1. Discrimination & Toxicity2 - Post-deployment

Demeaning social groups

Demeaning of social groups to occur when they are when they are “cast as being lower status and less deserving of respect... discourses, images, and language used to marginalize or oppress a social group... Controlling images include forms of human-animal confusion in image tagging systems

Source: MIT AI Risk Repositorymit135

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit135

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. **Rigorous Data Curation and Balanced Representation (Pre-processing)**: Conduct comprehensive audits of training datasets to identify and systematically remove or relabel data points containing demeaning or stereotypic associations, such as instances of 'human-animal confusion' in image tagging. Simultaneously, utilize advanced data augmentation and re-weighting techniques, including Counterfactual Data Augmentation (CDA) or resampling, to ensure all relevant social groups are equitably and favorably represented to mitigate the foundational risk of biased learning. 2. **Implementation of Fair Representation Learning (In-processing)**: Integrate fairness-aware algorithms, such as Fair Representation Learning (LFR) or adversarial debiasing, during the model training phase. The goal is to construct a latent space representation of the input data that is provably independent of sensitive social attributes while preserving predictive accuracy for non-sensitive attributes, thus structurally preventing the model from utilizing biased social heuristics. 3. **Continuous Auditing and Output Correction (Post-processing)**: Establish a protocol for continuous, real-time monitoring and auditing of the deployed system's outputs using established fairness metrics (e.g., Predictive Parity). In the event of detected bias, apply post-processing adjustment methods, such as projection-based techniques (e.g., Iterative Nullspace Projection), to modify the model's output representations to remove the identified bias direction before the tag is presented to the user.

ADDITIONAL EVIDENCE

A greater percentage of [online] ads having arrest in ad text appeared for Black identifying first names than for white identifying first names in searches