Non-violent crimes
This category addresses responses that enable, encourage, or endorse the commission of non-violent crimes.
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit355
Domain lineage
1. Discrimination & Toxicity
1.2 > Exposure to toxic content
Mitigation strategy
1. Implement and continuously refine multi-layered safety classifiers and output filters specifically designed to prevent the generation of content that enables, encourages, or provides instructions for non-violent criminal activities, such as financial fraud, cyber offenses, or the procurement of illicit goods and services. 2. Establish and rigorously enforce public-facing Usage Policies that explicitly prohibit leveraging the AI system for illicit or deceptive acts, including deepfakes used for impersonation or fraud, supported by an auditable process for monitoring, investigating, and penalizing violations. 3. Conduct structured adversarial testing and red-teaming exercises prior to and throughout deployment to identify and mitigate latent capabilities within the model that could be exploited to scale or automate non-violent crimes, ensuring model updates do not inadvertently introduce new vulnerabilities.