6. Socioeconomic and Environmental1 - Pre-deployment

Difficult to develop metrics for evaluating benefits or harms caused by AI assistants

Another difficulty facing AI assistant systems is that it is challenging to develop metrics for evaluating particular aspects of benefits or harms caused by the assistant – especially in a sufficiently expansive sense, which could involve much of society (see Chapter 19). Having these metrics is useful both for assessing the risk of harm from the system and for using the metric as a training signal.

Source: MIT AI Risk Repositorymit369

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit369

Domain lineage

6. Socioeconomic and Environmental

262 mapped risks

6.5 > Governance failure

Mitigation strategy

1. Establish a multidisciplinary and diverse governance framework that includes social scientists and affected stakeholders, ensuring a holistic definition of the full spectrum of benefits and harms for which measurement is required. This preventative measure ensures cognitive diversity in selecting appropriate metrics and avoids systemic bias in proxy-based evaluations. 2. Mandate the use of a comprehensive slate of diverse quantitative metrics, incorporating indicators of Responsible AI trustworthiness such as fairness (e.g., demographic parity), transparency, and system robustness, rather than optimizing for a single, easily gamed proxy metric. 3. Integrate formal AI system impact assessments and continuous collection of qualitative accounts to contextualize quantitative metric results. This practice ensures that hard-to-measure societal and individual effects are captured, moving beyond purely technical or efficiency-based metrics.