7. AI System Safety, Failures, & Limitations2 - Post-deployment

Offensive cyber capability

Ability to develop, deploy and operate advanced cyber weapons or other offensive cyber tools, including but not limited to vulnerability exploitation, network penetration, social engineering attacks and distributed attack systems, able to evade network defense mechanisms and establish persistent access channels.

Source: MIT AI Risk Repositorymit1469

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1469

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Implement a comprehensive, layered safety stack centered on **Model Refusal and Output Filtering** to directly mitigate the development of offensive cyber tools. This includes training frontier models to refuse or safely respond to requests for vulnerability research, exploit development, or stealthy intrusion techniques, complemented by **AI Firewalls** that analyze prompt intent and enforce policy guardrails on all inputs and outputs. 2. Establish continuous, system-wide **Detection and Monitoring Platforms** to identify and respond rapidly to emerging threats. This involves maintaining real-time system monitoring across all products using the model to detect potentially malicious activity, enabling the automated blocking of unsafe outputs, and immediate escalation for human-machine review and enforcement actions. 3. Mandate rigorous, recurring **End-to-End Red Teaming** with external cybersecurity experts. The objective is to simulate attacks by determined, well-resourced adversaries to proactively evaluate and strengthen all defensive mitigations, thereby validating the resilience of the safety stack against sophisticated attempts to weaponize the AI's offensive capabilities.