4. Malicious Actors & Misuse2 - Post-deployment

Model diversion

Model Diversion takes model manipulation one step further, by repurposing (often open-source) generative AI models in a way that diverts them from their intended functionality or from the use cases envisioned by their developers (Lin et al., 2024). An example of this is training the BERT open source model on the DarkWeb to create DarkBert.7

Source: MIT AI Risk Repositorymit1265

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1265

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. Implement proactive model-level defenses, such as technical mechanisms (e.g., proprietary disruption mechanisms or watermarking) designed to inhibit unauthorized re-training or merging operations, thereby preserving model integrity and detecting diversion post-deployment. 2. Mandate robust contractual provisions (e.g., Terms of Service, licensing agreements) that explicitly prohibit and define penalties for model repurposing, re-training on unauthorized datasets, and distillation of model knowledge, providing a clear legal basis for enforcement actions. 3. Deploy continuous, zero-trust monitoring and access control mechanisms for all model APIs and endpoints to enforce least-privilege access and detect anomalous usage patterns indicative of data exfiltration or unauthorized model interaction that could precede diversion.