Copyright infringement
The use of large amounts of copyrighted data for training general- purpose AI models poses a challenge to traditional intellectual property laws, and to systems of consent, compensation, and control over data. The use of copyrighted data at scale by organisations developing general- purpose AI is likely to alter incentives around creative expression.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit782
Domain lineage
6. Socioeconomic and Environmental
6.3 > Economic and cultural devaluation of human effort
Mitigation strategy
1. Establish rigorous intellectual property supply chain due diligence: Developers and deployers must audit the provenance of all training datasets, ensuring content is either licensed, proprietary, or in the public domain. For third-party systems, secure enterprise-grade licenses with explicit warranties and indemnification against infringement claims arising from the vendor's training data. 2. Mandate human oversight and critical review of all AI-generated outputs: Implement a mandatory clearance process for AI-generated content intended for public or commercial use. This process requires human editing and rewriting to reduce substantial similarity to any single source and utilize technical tools, such as plagiarism detectors and reverse image searches, to verify originality and mitigate downstream infringement risk. 3. Develop and enforce strict internal AI usage and data-handling policies: Implement comprehensive security protocols and internal governance policies that explicitly prohibit employees from inputting proprietary, confidential, or internal copyrighted materials into public AI models, thereby safeguarding trade secrets and minimizing the risk of inadvertent data exposure and IP leakage.