Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Fairness - Bias

Fairness is, by far, the most discussed issue in the literature, remaining a paramount concern especially in case of LLMs and text-to-image models. This is sparked by training data biases propagating into model outputs, causing negative effects like stereotyping, racism, sexism, ideological leanings, or the marginalization of minorities. Next to attesting generative AI a conservative inclination by perpetuating existing societal patterns, there is a concern about reinforcing existing biases when training new generative models with synthetic data from previous models. Beyond technical fairness issues, critiques in the literature extend to the monopolization or centralization of power in large AI labs, driven by the substantial costs of developing foundational models. The literature also highlights the problem of unequal access to generative AI, particularly in developing countries or among financially constrained groups. Sources also analyze challenges of the AI research community to ensure workforce diversity. Moreover, there are concerns regarding the imposition of values embedded in AI systems on cultures distinct from those where the systems were developed.

Source: MIT AI Risk Repositorymit70

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit70

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. Prioritize Rigorous Data Curation and Pre-processing Implement comprehensive data auditing and correction protocols to ensure training datasets are demographically and contextually representative. This involves techniques such as reweighting or upsampling underrepresented groups, utilizing synthetic data generation for scarcity mitigation, and deploying blind or multiple independent labeling methods to minimize human annotator bias before model ingestion. 2. Integrate Fairness-Aware Constraints and Algorithms During Training Incorporate explicit fairness constraints or regularization terms into the model's objective function (in-processing) to enforce equitable performance across sensitive subgroups. Advanced algorithmic approaches, such as adversarial debiasing or fair representation learning, should be employed to minimize the statistical correlation between model outputs and protected attributes without compromising utility. 3. Establish Continuous Oversight and Governance Frameworks Mandate a robust, post-deployment governance system that includes ongoing, automated bias detection and auditing using standardized fairness benchmarks (e.g., equal opportunity, demographic parity). This framework must institutionalize a Human-in-the-Loop (HITL) mechanism for human review of sensitive decisions and require transparency (Explainable AI/XAI) to foster accountability and allow for the reporting and iterative correction of emergent biases.