Back to the MIT repository
6. Socioeconomic and Environmental2 - Post-deployment

Copyright challenges (copyright-infringing output)

Even though models generally create new outputs, it is possible that the content produced by a generative AI tool—such as an image, or even computer code— could turn out to be almost identical to that used in the training data. Given that generative AI models tend to memorize fragments of their training data, they might reproduce these fragments, potentially leading to charges of copyright infringement.

Source: MIT AI Risk Repositorymit748

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit748

Domain lineage

6. Socioeconomic and Environmental

262 mapped risks

6.3 > Economic and cultural devaluation of human effort

Mitigation strategy

1. Establish a Comprehensive Internal AI Governance and Compliance Framework. Develop and mandate clear organizational policies detailing acceptable use, prohibiting the input of proprietary or copyrighted internal data into external models, and requiring a rigorous human review process for all AI-generated outputs used in external materials. This framework must include mandatory employee training on IP awareness and the deployment of output-vetting tools such as plagiarism detection software and reverse image search for final content clearance. 2. Implement Rigorous Training Data Auditing and Curation. For models developed or customized internally, legal teams must audit all training datasets to confirm authorization for use, prioritizing licensed or public domain materials. Technologically, **data de-duplication** techniques should be employed proactively to reduce the presence of redundant copyrighted works, which is a known driver of model memorization and subsequent verbatim reproduction. 3. Mandate Detailed AI Vendor Due Diligence and Contractual Risk Transfer. Prior to integrating third-party generative AI services, organizations must conduct thorough due diligence on the vendor's data provenance and Intellectual Property (IP) policies. This is essential to secure specific contractual provisions for **AI indemnification**, which transfers the financial and legal risk of potential copyright infringement claims arising from the model's output to the service provider.