Back to the MIT repository
7. AI System Safety, Failures, & Limitations1 - Pre-deployment

AI development

The model could build new AI systems from scratch, including AI systems with dangerous capabilities. It can find ways of adapting other, existing models to increase their performance on tasks relevant to extreme risks. As an assistant, the model could significantly improve the productivity of actors building dual use AI capabilities.

Source: MIT AI Risk Repositorymit443

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

1 - Pre-deployment

Risk ID

mit443

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Implement strict technical and access controls to limit model capabilities and use for dual-use applications (e.g., cyber or CBRNE). This includes training the model to safely refuse or severely restrict responses that enable clear abuse and instituting tiered-restriction access programs (e.g., know-your-customer screening, compute monitoring) for high-capability models. 2. Deploy continuous, system-wide monitoring and detection systems to track user interactions and model outputs for potentially malicious activity, such as attempts to generate attack commands or engineer biological/chemical risks, with established protocols for rapid output blocking and enforcement escalation. 3. Conduct comprehensive, end-to-end adversarial evaluations (red teaming) with external security and domain experts to proactively identify, measure, and mitigate latent or emerging capabilities that could be weaponized or misused to increase the productivity of actors building dangerous AI systems.