Disparate Performance
The LLM’s performances can differ significantly across different groups of users. For example, the question-answering capability showed significant performance differences across different racial and social status groups. The fact-checking abilities can differ for different tasks and languages
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
3 - Other
Risk ID
mit491
Domain lineage
1. Discrimination & Toxicity
1.3 > Unequal performance across groups
Mitigation strategy
1. **Rigorous Data Curation and Augmentation Protocols** Implement systematic data auditing to identify and correct imbalances, underrepresentation, or inherent biases within the training corpus. Proactive strategies, such as dataset augmentation, relabelling (e.g., massaging), or feature perturbation (e.g., Disparate Impact Remover), should be employed to ensure the training data is representative across all relevant user demographics and cultural contexts, thereby mitigating the core source of disparate performance. 2. **Integration of Fairness-Aware Optimization Functions** Apply in-processing bias mitigation techniques by modifying the model's loss function during the training phase. This involves incorporating fairness-aware algorithms such as MinDiff or Counterfactual Logit Pairing (CLP) to introduce penalties for prediction discrepancies across sensitive attributes or predefined data slices, explicitly optimizing the model toward a more equitable performance profile (e.g., achieving equalized odds or demographic parity). 3. **Continuous Performance Monitoring and Algorithmic Audits** Establish a robust governance framework that mandates continuous, multi-metric performance monitoring and auditing. This includes regularly evaluating the LLM's outputs against specialized benchmarks using diverse fairness metrics (e.g., those found in tools like AI Fairness 360) to detect post-deployment regressions or the emergence of new biases. Regular testing, including adversarial and prompt injection attempts, should focus on assessing the model's consistency and robustness across varying user groups.
ADDITIONAL EVIDENCE
There are multiple causes for the disparate performance, including the inherent difficulties in different tasks, the lack of particular dimensions of data, the imbalance in the training data, and the difficulty in understanding the cultural background of different societies