4. Malicious Actors & Misuse2 - Post-deployment

Cyber

As the programming abilities of AI systems continue to expand, frontier AI is likely to significantly exacerbate existing cyber risks. Most notably, AI systems can be used by potentially anyone to create faster paced, more effective and larger scale cyber intrusion via tailored phishing methods or replicating malware. Frontier AI’s effect on the overall balance between cyber offence and defence is uncertain, as these tools also have many applications in improving the cybersecurity of systems and defenders are mobilising significant resources to utilise frontier AI for defensive purposes.209 In the future, we may see AI systems both conducting and defending against cyberattacks with reduced human oversight at each step.

Source: MIT AI Risk Repositorymit1381

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1381

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. Implement Robust Safety Alignment and Refusal Engineering: Systematically train frontier models using techniques such as supervised fine-tuning on refusal examples and Constitutional AI to establish safe behavioral boundaries and prevent the generation of malicious outputs, including sophisticated exploit code or tailored phishing content. 2. Deploy Real-Time Misuse Detection and Intervention Systems: Institute system-wide monitoring across all deployment infrastructures to analyze model inputs and outputs for anomalous or prohibited activities indicative of cyber misuse. This includes deploying classifiers and automated response mechanisms to block unsafe outputs, reroute high-risk prompts, and escalate for immediate human-led enforcement. 3. Mandate Advanced AI-Enabled Social Engineering Awareness Training: Develop and deliver continuous educational modules focused specifically on fortifying the human defense layer against the heightened realism of AI-powered human-targeted attacks, such as deepfakes, voice replication, and highly personalized phishing campaigns.