4. Malicious Actors & Misuse2 - Post-deployment

Malicious Use Risks

As general- purpose AI covers a broad set of knowledge areas, it can be repurposed for malicious ends, potentially causing widespread harm. This section discusses some of the major risks of malicious use, but there are others and new risks may continue to emerge. While the risks discussed in this section range widely in terms of how well- evidenced they are, and in some cases, there is evidence suggesting that they may currently not be serious risks at all, we include them to provide a comprehensive overview of the malicious use risks associated with general- purpose AI systems.

Source: MIT AI Risk Repositorymit768

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit768

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.0 > Malicious use

Mitigation strategy

1. Implement Integrated Model Hardening via Fine-tuning and Filtering Employ supervised fine-tuning, such as Reinforcement Learning from Human Feedback (RLHF), to explicitly condition the general-purpose AI model to refuse and abstain from generating instructions or content that facilitate malicious activities. This must be complemented by robust input and output filters designed to perform real-time detection and blocking of adversarial user prompts and subsequent harmful model responses. 2. Establish Continuous Adversarial Testing and Validation Protocols Mandate systematic, ongoing red teaming and automated testing to proactively explore model vulnerabilities against known and novel attack vectors, including various forms of prompt injection and jailbreaking. This process, including adversarial training, is essential for measuring and increasing model resilience and robustness before and throughout post-deployment cycles. 3. Enforce Granular Access Control and Comprehensive Logging Require the application of strict Identity and Access Management (IAM) policies and role-based access controls (RBAC) to ensure adherence to the principle of least privilege, thereby limiting the pool of actors who can modify or interact with the system's core functions. Additionally, all prompts, outputs, and system telemetry must be logged and ingested into a centralized Security Information and Event Management (SIEM) system for continuous anomaly detection and rigorous incident forensics.