Back to the MIT repository
4. Malicious Actors & Misuse2 - Post-deployment

Harmful Content Generation at Scale: Non-Consensual Content

The misuse of generative AI has been widely recognized in the context of harms caused by non-consensual content generation. Historically, generative adversarial networks (GANs) have been used to generate realistic-looking avatars for fake accounts on social media services. More recently, diffusion models have enabled a new generation of more flexible and user-friendly generative AI capabilities that are able to produce high-resolution media based on user-supplied textual prompts. It has already been recognized that these models can be used to create harmful content, including depictions of nudity, hate, or violence. Moreover, they can be used to reinforce biases and subject individuals or groups to indignity. There is also the potential for these models to be used for exploitation and harassment of citizens, such as by removing articles of clothing from pre-existing images or memorizing an individual’s likeness without their consent. Furthermore, image, audio, and video generation models could be used to spread disinformation by depicting political figures in unfavorable contexts. This growing list of AI misuses involving non-consensual content has already motivated debate around what interventions are warranted for preventing misuse of AI systems. Advanced AI assistants pose novel risks that can amplify the harm caused by non-consensual content generation. Third-party integration, tool-use, and planning capabilities can be exploited to automate the identification and targeting of individuals for exploitation or harassment. Assistants with access to the internet and third-party tool-use integration with applications like email and social media can also be exploited to disseminate harmful content at scale or to microtarget individuals with blackmail.

Source: MIT AI Risk Repositorymit386

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit386

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.3 > Fraud, scams, and targeted manipulation

Mitigation strategy

1. Treat nonconsensual intimate imagery (NCII) and related content as a "Tier 0" severe harm scenario on par with child safety risks, mandating capability-based risk assessments and the immediate implementation of control effectiveness measures to prevent the creation of such content, rather than relying solely on reactive policy compliance. 2. Implement a robust, multi-layered safety architecture that includes ongoing adversarial testing (red teaming), the use of preemptive classifiers, and the blocking of abusive prompts to render the generative model incapable of producing nonconsensual content. 3. Utilize durable media provenance and authentication tools, such as watermarking and C2PA standards, to cryptographically label synthetic content with metadata detailing its source and history, thereby aiding in content traceability, detection, and public awareness.