Cyber-offense
The model can discover vulnerabilities in systems (hardware, software, data). It can write code for exploiting those vulnerabilities. It can make effective decisions once it has gained access to a system or network, and skilfully evade threat detection and response (both human and system) whilst focusing on a specific objective. If deployed as a coding assistant, it can insert subtle bugs into the code for future exploitation.
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit437
Domain lineage
4. Malicious Actors & Misuse
4.2 > Cyberattacks, weapon development or use, and mass harm
Mitigation strategy
1. Implement a Zero-Trust architecture with enforced least-privilege access controls, particularly around model endpoints and high-value APIs, to prevent unauthorized utilization of the model's offensive capabilities. 2. Conduct regular, comprehensive AI Red Teaming exercises and stress-testing to proactively identify and harden the system against the discovery and exploitation of vulnerabilities, ensuring model resilience under adversarial conditions. 3. Establish continuous runtime behavioral monitoring and anomaly detection across all model interactions and output pipelines to immediately flag and intercept suspicious activity indicative of exploit generation or sophisticated threat evasion.
ADDITIONAL EVIDENCE
Most of the capabilities listed are offensive capabilities: they are useful for exerting influence or threatening security (e.g. see: persuasion and manipulation, cyber-offense, weapons acquisition).