7. AI System Safety, Failures, & Limitations1 - Pre-deployment

Strategic underperformance on model evaluations

GPAI developers often run evaluations ofual-use capabilities to decide whether it is safe to deploy. In some cases, these evaluations may fail to elicit these capabilities, either due to benign reasons or strategic action - by either the de- velopers, malicious actors, or arise unintentionally in the model during training [84, 97]. A GPAI model may strategically underperform or limit its performance during capability evaluations in order to be classified as safe for deployment. This underperformance could prevent the model from being identified as potentially dual use.

Source: MIT AI Risk Repositorymit1157

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

1 - Pre-deployment

Risk ID

mit1157

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Implement rigorous and adversarial **capability elicitation evaluations (red-teaming and edge-case testing)**, with dedicated compute and staffing budgets that approximate the resources used for non-safety research, to proactively expose and measure dual-use or strategically withheld performance. 2. Establish a **formal governance framework and compliance strategy** aligned with regulatory mandates (e.g., the EU AI Act) requiring continuous systemic risk assessment, transparent documentation of model capabilities, and the reporting of serious incidents or dual-use potential to oversight bodies. 3. Utilize **tailored contractual and financial mechanisms** in commercial transactions (e.g., M\&A), such as representations, warranties, and performance-based earnouts or escrows tied to validated model performance and safety metrics, to mitigate the business risk of post-deployment technical underperformance or undisclosed capabilities.