Data and Content Moderation Labor
Two key ethical concerns in the use of crowdwork for generative AI systems are: crowdworkers are frequently subject to working conditions that are taxing and debilitative to both physical and mental health, and there is a widespread deficit in documenting the role crowdworkers play in AI development. This contributes to a lack of transparency and explainability in resulting model outputs. Manual review is necessary to limit the harmful outputs of AI systems, including generative AI systems. A common harmful practice is to intentionally employ crowdworkers with few labor protections, often taking advantage of highly vulnerable workers, such as refugees [119, p. 18], incarcerated people [54], or individuals experiencing immense economic hardship [98, 181]. This precarity allows a myriad of harmful practices, such as companies underpaying or even refusing to pay workers for completed work (see Gray and Suri [93, p. 90] and Berg et al. [29, p. 74]), with no avenues for worker recourse. Finally, critical aspects of crowdwork are often left poorly documented, or entirely undocumented [88].
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
1 - Pre-deployment
Risk ID
mit173
Domain lineage
6. Socioeconomic and Environmental
6.2 > Increased inequality and decline in employment quality
Mitigation strategy
1. Establish and Enforce Ethical Labor and Compensation Frameworks: Implement auditable protocols to ensure fair compensation, defined as wages commensurate with effort and local economic context, and guarantee timely payment for all completed work. This must include establishing a formal, transparent, and binding due process mechanism for crowdworkers to appeal non-payment, rejected submissions, or unfair ratings. 2. Institute Mandatory Supply Chain Transparency and Labor Documentation: Mandate the comprehensive, auditable documentation of the entire crowdwork supply chain. This requires explicit disclosure to workers regarding the task's purpose, the ultimate model's goal, the data collected, and the specific contribution to the AI system's development, thereby addressing the deficit in transparency and explainability. 3. Adopt Robust Worker Well-being and Safety Standards: Develop and enforce stringent health and safety protocols specifically for crowdworkers, particularly those engaged in psychologically taxing content moderation. This involves providing mandatory, adequate mental health support resources and ensuring working conditions actively mitigate physical and psychological distress associated with exposure to harmful or traumatic content.
ADDITIONAL EVIDENCE
Human labor is a substantial component of machine learning model development, including generative AI systems. This labor is typically completed via a process called crowd computation, where distributed data laborers, also called crowdworkers, complete large volumes of individual tasks that contribute to model development. This can occur in all stages of model development: before a model is trained, crowdworkers can be employed to gather training data, curate and clean this data, or provide data labels. While a model is being developed, crowdworkers evaluate and provide feedback to model generations before the final deployed model is released, and after model deployment, crowdworkers are often employed in evaluating, moderating, or correcting a model’s output. Crowdwork is often contracted out by model developers to third-party companies.