Dangerous use
Generative AI models might be used with the sole intention of harming people.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1301
Domain lineage
4. Malicious Actors & Misuse
4.0 > Malicious use
Mitigation strategy
1. Establish a multi-layered, active safety architecture incorporating preemptive classifiers, stringent input/output validation, and contextual refusal mechanisms to actively inhibit the generation of malicious code, deceptive media, or harmful content. 2. Implement continuous adversarial testing (Red Teaming) and real-time behavioral monitoring to promptly detect and remediate attempts at model subversion, prompt injection, and other forms of post-deployment malicious manipulation. 3. Mandate the use of verifiable media provenance and digital watermarking for all generative AI outputs, ensuring traceability and accountability for the dissemination of deepfakes and high-risk synthetic content.