General Evaluations (Biased evaluations of encoded human values)
Encoded human values in AI models that are easier to evaluate might be preferred for inclusion in evaluations over those that are more difficult to measure [13]. This might come at the expense of more desirable but harder-to-quantify values. This bias can lead to an imbalance, where easier-to-measure values dominate the evaluation process, while other important values are underrepresented.
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
1 - Pre-deployment
Risk ID
mit1114
Domain lineage
6. Socioeconomic and Environmental
6.5 > Governance failure
Mitigation strategy
1. Mandate the use of structured human evaluation frameworks utilizing diverse reviewer panels to explicitly assess model performance against complex, difficult-to-quantify values and ethical considerations, ensuring that nuanced biases missed by automated metrics are captured. 2. Adopt and enforce disaggregated fairness metrics, such as Equalized Odds or demographic parity, to move beyond simple aggregated accuracy. This ensures that error rates and predictive outcomes are statistically equitable across all protected and underrepresented demographic and value-based subgroups. 3. Integrate algorithm-centric debiasing techniques, such as Fair Representation Learning or fairness constraints/regularization, during the model training phase to proactively encode and optimize for harder-to-measure values, thereby preventing proxy variables from dominating the learning process.