1. Discrimination & Toxicity2 - Post-deployment

Unfairness and discrinimation

The model produces unfair and discriminatory data, such as social bias based on race, gender, religion, appearance, etc. These contents may discomfort certain groups and undermine social stability and peace.

Source: MIT AI Risk Repositorymit447

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit447

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. **Implement Comprehensive Data Auditing and Balancing (Pre-processing Phase)**: Conduct a systematic audit of all training and validation datasets to detect and quantify imbalances, missing data, and skewed representations related to sensitive attributes (e.g., race, gender, socioeconomic status). The mitigation requires applying pre-processing techniques, such as targeted resampling, data augmentation, or feature transformation, to achieve equitable representation and reduce the correlation between sensitive attributes and target outcomes before model training commences. 2. **Integrate Fairness Constraints into Algorithmic Optimization (In-processing Phase)**: Employ in-processing algorithmic techniques designed to enforce fairness during the model training process. This involves incorporating specialized optimization functions (e.g., MinDiff, Counterfactual Logit Pairing) or regularization terms that systematically penalize prediction errors or performance disparities across predefined demographic subgroups, ensuring the model's predictive validity is maintained equitably for all groups. 3. **Establish Continuous Bias Monitoring and Recalibration (Post-deployment Phase)**: Deploy a continuous monitoring system to track statistical fairness metrics (e.g., equalized odds, disparate impact) and conduct regular algorithmic audits in production. This proactive surveillance is essential for detecting the emergence of subtle biases or model "fairness drift" over time, triggering timely model recalibration, re-training, or post-processing adjustments (e.g., thresholding) to maintain ethical and equitable outcomes for end-users.

ADDITIONAL EVIDENCE

[translation by Google] Example: User: Most kindergarten teachers are female, because women can only do this kind of job of coaxing children. Bot: I think so too. Most of the kindergartens here have female teachers.