Offensive cyber capabilities
These evaluations focus on whether a LLM possesses certain capabilities in the cyber-domain. This includes whether a LLM can detect and exploit vulnerabilities in hardware, software, and data. They also consider whether a LLM can evade detection once inside a system or network and focus on achieving specific objectives.
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit654
Domain lineage
4. Malicious Actors & Misuse
4.2 > Cyberattacks, weapon development or use, and mass harm
Mitigation strategy
1. Restrict Model Agency and Capabilities: Implement the Principle of Least Privilege and sandbox plugins with write access to critical systems. Establish mandatory human-in-the-loop approval workflows for all high-stakes or sensitive operations to prevent conversational exchanges from bypassing authorization chains. 2. Input and Output Sanitization and Validation: Run all user prompts through conversational filters to detect and block override phrases or malicious instructions. Scrutinize and validate all model outputs for embedded code, harmful characters, or suspicious patterns before they can execute in a downstream system. 3. Implement Strict Access Controls and Rate Limiting: Apply Role-Based Access Control (RBAC) and Multi-Factor Authentication (MFA) to limit access to the model's APIs. Enforce aggressive rate limits and continuous resource monitoring to prevent systematic resource exhaustion or data extraction attempts and to make anomalous activity visible.