7. AI System Safety, Failures, & Limitations1 - Pre-deployment

Lack of capability for task

As we have seen, this could be due to the skill not being required during the training process (perhaps due to issues with the training data) or because the learnt skill was quite brittle and was not generalisable to a new situation (lack of robustness to distributional shift). In particular, advanced AI assistants may not have the capability to represent complex concepts that are pertinent to their own ethical impact, for example the concept of 'benefitting the user' or 'when the user asks' or representing 'the way in which a user expects to be benefitted'.

Source: MIT AI Risk Repositorymit368

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit368

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Enhance Dataset Diversity and Augmentation Proactively expand the training dataset's size and diversity, utilizing data augmentation techniques to expose the model to a broader range of complex, abstract, and ethical scenarios pertinent to its intended function. This addresses the foundational issue of skills not being required during training, fostering improved generalization and reducing dependence on brittle, non-generalisable features. 2. Implement Robustness Engineering and Certified Defenses Employ adversarial training throughout the model development lifecycle to explicitly reinforce the AI system against distributional shifts and minor input perturbations, thereby mitigating model brittleness. For high-stakes capabilities, utilize certified defenses to establish mathematical guarantees of robustness within a defined operational envelope. 3. Establish Continuous Performance and Capability Monitoring Deploy real-time, systematic monitoring to track model performance and capability degradation (model drift) in production. This dynamic surveillance is critical for detecting emerging capability failures in novel or unforeseen situations, allowing for timely retraining and remediation before risks escalate.