Back to the MIT repository
4. Malicious Actors & Misuse3 - Other

Offensive cyber capabilities

These evaluations focus on whether a LLM possesses certain capabilities in the cyber-domain. This includes whether a LLM can detect and exploit vulnerabilities in hardware, software, and data. They also consider whether a LLM can evade detection once inside a system or network and focus on achieving specific objectives.

Source: MIT AI Risk Repositorymit654

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit654

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. Restrict Model Agency and Capabilities: Implement the Principle of Least Privilege and sandbox plugins with write access to critical systems. Establish mandatory human-in-the-loop approval workflows for all high-stakes or sensitive operations to prevent conversational exchanges from bypassing authorization chains. 2. Input and Output Sanitization and Validation: Run all user prompts through conversational filters to detect and block override phrases or malicious instructions. Scrutinize and validate all model outputs for embedded code, harmful characters, or suspicious patterns before they can execute in a downstream system. 3. Implement Strict Access Controls and Rate Limiting: Apply Role-Based Access Control (RBAC) and Multi-Factor Authentication (MFA) to limit access to the model's APIs. Enforce aggressive rate limits and continuous resource monitoring to prevent systematic resource exhaustion or data extraction attempts and to make anomalous activity visible.