General Evaluations (Difficulty of identification and measurement of capabilities)
The capabilities of general-purpose AI systems can be difficult to measure, compared to the capabilities of more limited and fixed-purpose AI systems. This is in part due to a broader distribution of potential risks, a lack of well-defined metrics to evaluate these risks, and risks from unpredictable (or emergent) AI model properties.
ENTITY
3 - Other
INTENT
3 - Other
TIMING
3 - Other
Risk ID
mit1111
Domain lineage
7. AI System Safety, Failures, & Limitations
7.4 > Lack of transparency or interpretability
Mitigation strategy
1. Prioritize Continuous Adversarial and Stress Testing: Systematically implement rigorous AI red teaming and model stress testing throughout the lifecycle to proactively discover, characterize, and document emergent and unpredictable model behaviors, particularly those that violate safety guardrails or manifest under adversarial prompts and real-world edge cases. 2. Adopt a Formal, Lifecycle-Centric Risk Management Framework: Establish and integrate a recognized AI risk management framework (e.g., NIST AI RMF) to provide a systematic and repeatable structure for continuous risk assessment, measurement, and governance. This institutionalizes the process of defining metrics and scoping risks across the broad distribution of general-purpose AI capabilities. 3. Mandate Explainability (XAI) and Scalable Oversight Mechanisms: Employ advanced interpretability techniques (XAI) to enhance model transparency and facilitate human understanding of decision-making. Simultaneously, develop and deploy scalable oversight mechanisms to enable effective human intervention and monitoring for capability drift or non-obvious failure modes in high-impact, autonomous systems.