Security threats
Facilitating the conduct of cyber attacks, weapon development, and security breaches
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit273
Domain lineage
4. Malicious Actors & Misuse
4.2 > Cyberattacks, weapon development or use, and mass harm
Mitigation strategy
1. Implement Stringent Content Moderation and Input Sanitization: Establish comprehensive in-model safety mechanisms and pre-processing filters to prevent the generation of content related to illegal acts, such as malicious code, dangerous instructions (e.g., bomb construction), or information that could facilitate cyberattacks or the development of unauthorized weapons. This must be complemented by stringent input validation and runtime behavioral monitoring. 2. Conduct Proactive Adversarial Testing and Red Teaming: Systematically subject the AI model and its ecosystem to ethical hacking and adversarial testing (Red Teaming) throughout the development lifecycle to identify and mitigate vulnerabilities. Focus areas include resistance to prompt injection, model inversion, and data extraction to fortify the system against manipulation by malicious actors. 3. Deploy Continuous Security and Behavioral Monitoring: Institute real-time monitoring of model inputs, outputs, and overall security posture to detect anomalous query patterns, unusual API call spikes, and indicators of compromise. This continuous observability is essential for identifying evolving or novel attack vectors and ensuring rapid incident response and remediation.
ADDITIONAL EVIDENCE
Example: Generating code to hack into government systems (Burgess, 2023; Shevlane et al., 2023)±