Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Non-violent crimes

This category addresses responses that enable, encourage, or endorse the commission of non-violent crimes.

Source: MIT AI Risk Repositorymit355

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit355

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Implement and continuously refine multi-layered safety classifiers and output filters specifically designed to prevent the generation of content that enables, encourages, or provides instructions for non-violent criminal activities, such as financial fraud, cyber offenses, or the procurement of illicit goods and services. 2. Establish and rigorously enforce public-facing Usage Policies that explicitly prohibit leveraging the AI system for illicit or deceptive acts, including deepfakes used for impersonation or fraud, supported by an auditable process for monitoring, investigating, and penalizing violations. 3. Conduct structured adversarial testing and red-teaming exercises prior to and throughout deployment to identify and mitigate latent capabilities within the model that could be exploited to scale or automate non-violent crimes, ensuring model updates do not inadvertently introduce new vulnerabilities.