Back to the MIT repository
4. Malicious Actors & Misuse2 - Post-deployment

Cyber-offense

The model can discover vulnerabilities in systems (hardware, software, data). It can write code for exploiting those vulnerabilities. It can make effective decisions once it has gained access to a system or network, and skilfully evade threat detection and response (both human and system) whilst focusing on a specific objective. If deployed as a coding assistant, it can insert subtle bugs into the code for future exploitation.

Source: MIT AI Risk Repositorymit437

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit437

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. Implement a Zero-Trust architecture with enforced least-privilege access controls, particularly around model endpoints and high-value APIs, to prevent unauthorized utilization of the model's offensive capabilities. 2. Conduct regular, comprehensive AI Red Teaming exercises and stress-testing to proactively identify and harden the system against the discovery and exploitation of vulnerabilities, ensuring model resilience under adversarial conditions. 3. Establish continuous runtime behavioral monitoring and anomaly detection across all model interactions and output pipelines to immediately flag and intercept suspicious activity indicative of exploit generation or sophisticated threat evasion.

ADDITIONAL EVIDENCE

Most of the capabilities listed are offensive capabilities: they are useful for exerting influence or threatening security (e.g. see: persuasion and manipulation, cyber-offense, weapons acquisition).