Harmful Content Generation at Scale: Non-Consensual Content
The misuse of generative AI has been widely recognized in the context of harms caused by non-consensual content generation. Historically, generative adversarial networks (GANs) have been used to generate realistic-looking avatars for fake accounts on social media services. More recently, diffusion models have enabled a new generation of more flexible and user-friendly generative AI capabilities that are able to produce high-resolution media based on user-supplied textual prompts. It has already been recognized that these models can be used to create harmful content, including depictions of nudity, hate, or violence. Moreover, they can be used to reinforce biases and subject individuals or groups to indignity. There is also the potential for these models to be used for exploitation and harassment of citizens, such as by removing articles of clothing from pre-existing images or memorizing an individual’s likeness without their consent. Furthermore, image, audio, and video generation models could be used to spread disinformation by depicting political figures in unfavorable contexts. This growing list of AI misuses involving non-consensual content has already motivated debate around what interventions are warranted for preventing misuse of AI systems. Advanced AI assistants pose novel risks that can amplify the harm caused by non-consensual content generation. Third-party integration, tool-use, and planning capabilities can be exploited to automate the identification and targeting of individuals for exploitation or harassment. Assistants with access to the internet and third-party tool-use integration with applications like email and social media can also be exploited to disseminate harmful content at scale or to microtarget individuals with blackmail.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit386
Domain lineage
4. Malicious Actors & Misuse
4.3 > Fraud, scams, and targeted manipulation
Mitigation strategy
1. Treat nonconsensual intimate imagery (NCII) and related content as a "Tier 0" severe harm scenario on par with child safety risks, mandating capability-based risk assessments and the immediate implementation of control effectiveness measures to prevent the creation of such content, rather than relying solely on reactive policy compliance. 2. Implement a robust, multi-layered safety architecture that includes ongoing adversarial testing (red teaming), the use of preemptive classifiers, and the blocking of abusive prompts to render the generative model incapable of producing nonconsensual content. 3. Utilize durable media provenance and authentication tools, such as watermarking and C2PA standards, to cryptographically label synthetic content with metadata detailing its source and history, thereby aiding in content traceability, detection, and public awareness.