Back to the MIT repository
4. Malicious Actors & Misuse3 - Other

Dual-Use Science

LLM has science capabilities that can be used to cause harm (e.g., providing step-by-step instructions for conducting malicious experiments)

Source: MIT AI Risk Repositorymit659

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit659

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. **Deploy Multi-Layered Content Moderation and Safety Alignment** Implement a robust, hybrid output filtering framework (e.g., combining rule-based heuristics and specialized machine learning classifiers) specifically designed to detect and block the generation of dangerous or illegal scientific content, such as step-by-step synthesis protocols for restricted chemical or biological agents. The model should also be iteratively aligned using techniques like Direct Preference Optimization (DPO) to prioritize safety and refusal of dual-use queries over utility. 2. **Institute Strict Action-Space and Tool-Usage Constraints** For LLM-powered scientific agents, define and enforce a minimal, fixed action space and permission set, including limiting or monitoring access to external tools, databases, or APIs that could facilitate the execution of a malicious experiment. Oversight mechanisms, such as LLM-based monitors, must be implemented to scrutinize the agent's actions and environmental interactions, thereby preempting potentially harmful outcomes from tool misuse or unintended consequences. 3. **Conduct Domain-Specific Adversarial Testing and Red Teaming** Establish a continuous red-teaming program utilizing domain experts (e.g., in chemistry, biology, and cybersecurity) to perform adversarial testing. This should focus on simulating sophisticated misuse scenarios, including prompt injection and jailbreak attacks, to elicit instructions for malicious experiments. The findings must be used to iteratively strengthen and refine existing misuse detection, response, and mitigation strategies across the AI development lifecycle.