Back to the MIT repository
4. Malicious Actors & Misuse2 - Post-deployment

Dangerous use

Generative AI models might be used with the sole intention of harming people.

Source: MIT AI Risk Repositorymit1301

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1301

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.0 > Malicious use

Mitigation strategy

1. Establish a multi-layered, active safety architecture incorporating preemptive classifiers, stringent input/output validation, and contextual refusal mechanisms to actively inhibit the generation of malicious code, deceptive media, or harmful content. 2. Implement continuous adversarial testing (Red Teaming) and real-time behavioral monitoring to promptly detect and remediate attempts at model subversion, prompt injection, and other forms of post-deployment malicious manipulation. 3. Mandate the use of verifiable media provenance and digital watermarking for all generative AI outputs, ensuring traceability and accountability for the dissemination of deepfakes and high-risk synthetic content.