Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Increased labor

increased burden (e.g., time spent) or effort required by members of certain social groups to make systems or products work as well for them as others

Source: MIT AI Risk Repositorymit145

ENTITY

3 - Other

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit145

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.3 > Unequal performance across groups

Mitigation strategy

1. Mandate Disaggregated Performance Testing and Targeted Retraining Conduct rigorous Disaggregated Performance Testing, segmenting performance metrics (e.g., Word Error Rate or WER) across all relevant demographic and socio-acoustic groups, including dialect, accent, and speaking style. Prioritize the fine-tuning or re-training of the Automatic Speech Recognition (ASR) model with an oversampled, high-quality dataset representative of the underperforming groups to reduce the identified unequal performance disparity (Sub-domain 1.3). This directly implements the Risk Reduction strategy by addressing the algorithmic root cause of the increased user burden. 2. Establish Continuous Equity-Based Performance Monitoring Define and track a set of Key Risk Indicators (KRIs) focused on equity, such as the maximum differential in WER between the best- and worst-performing user groups. Implement a continuous monitoring process to audit these disaggregated performance metrics in real-time or near-real-time production environments (2 - Post-despliegue). This ensures the mitigation's sustained effectiveness and enables prompt detection of performance degradation or the emergence of new biases, thereby preventing a recurrence of the "Increased labor" harm. 3. Implement a Transparent User Feedback and Remediation Loop Develop and integrate a simple, in-system mechanism for users to report performance failures or difficulties specifically related to their speech style, accent, or dialect. Transparently communicate the system's known performance limitations to users, particularly those with a higher likelihood of experiencing unequal performance. The collected qualitative user feedback must be automatically categorized and routed as high-priority data for quantitative analysis and subsequent model improvement cycles.

ADDITIONAL EVIDENCE

I modify the way I talk to get a clear and concise response. I feel at times, voice recognition isn't programmed to understand people when they're not speaking in a certain way