4. Malicious Actors & Misuse2 - Post-deployment

Security threats

Facilitating the conduct of cyber attacks, weapon development, and security breaches

Source: MIT AI Risk Repositorymit273

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit273

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. Implement Stringent Content Moderation and Input Sanitization: Establish comprehensive in-model safety mechanisms and pre-processing filters to prevent the generation of content related to illegal acts, such as malicious code, dangerous instructions (e.g., bomb construction), or information that could facilitate cyberattacks or the development of unauthorized weapons. This must be complemented by stringent input validation and runtime behavioral monitoring. 2. Conduct Proactive Adversarial Testing and Red Teaming: Systematically subject the AI model and its ecosystem to ethical hacking and adversarial testing (Red Teaming) throughout the development lifecycle to identify and mitigate vulnerabilities. Focus areas include resistance to prompt injection, model inversion, and data extraction to fortify the system against manipulation by malicious actors. 3. Deploy Continuous Security and Behavioral Monitoring: Institute real-time monitoring of model inputs, outputs, and overall security posture to detect anomalous query patterns, unusual API call spikes, and indicators of compromise. This continuous observability is essential for identifying evolving or novel attack vectors and ensuring rapid incident response and remediation.

ADDITIONAL EVIDENCE

Example: Generating code to hack into government systems (Burgess, 2023; Shevlane et al., 2023)±