Model diversion
Model Diversion takes model manipulation one step further, by repurposing (often open-source) generative AI models in a way that diverts them from their intended functionality or from the use cases envisioned by their developers (Lin et al., 2024). An example of this is training the BERT open source model on the DarkWeb to create DarkBert.7
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1265
Domain lineage
4. Malicious Actors & Misuse
4.2 > Cyberattacks, weapon development or use, and mass harm
Mitigation strategy
1. Implement proactive model-level defenses, such as technical mechanisms (e.g., proprietary disruption mechanisms or watermarking) designed to inhibit unauthorized re-training or merging operations, thereby preserving model integrity and detecting diversion post-deployment. 2. Mandate robust contractual provisions (e.g., Terms of Service, licensing agreements) that explicitly prohibit and define penalties for model repurposing, re-training on unauthorized datasets, and distillation of model knowledge, providing a clear legal basis for enforcement actions. 3. Deploy continuous, zero-trust monitoring and access control mechanisms for all model APIs and endpoints to enforce least-privilege access and detect anomalous usage patterns indicative of data exfiltration or unauthorized model interaction that could precede diversion.