7. AI System Safety, Failures, & Limitations2 - Post-deployment

Robustness

This is the risk of the system failing or being unable to recover upon encountering invalid, noisy, or out-of-distribution (OOD) inputs.

Source: MIT AI Risk Repositorymit192

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit192

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Implement Adversarial Training (In-processing Strategy): Systematically train the model on inputs intentionally perturbed by adversarial examples to enhance resilience against both targeted manipulation (e.g., jailbreaks) and subtle, imperceptible attacks, thereby directly addressing latent vulnerabilities exploited by malicious actors. 2. Employ Pre-processing Strategies for Distributional Robustness: Utilize advanced Data Augmentation techniques to simulate expected real-world variations (e.g., sociolinguistic variation, differing lighting conditions) and enforce rigorous Data Filtering to exclude or normalize invalid, noisy, or spurious training data, optimizing Out-of-Distribution (OOD) generalization capabilities. 3. Deploy Post-processing Output Filtering and Validation: Integrate robust runtime mechanisms, such as semantic checks or LLM-as-a-Judge verification pipelines, to scrutinize and potentially suppress or correct system outputs. This serves as a critical final-stage defense against the manifestation of negative consequences resulting from prior input misinterpretation or system failure.

ADDITIONAL EVIDENCE

This is the risk of the system failing or being unable to recover upon encountering invalid, noisy, or out-of-distribution (OOD) inputs. There is often significant variation in real-world environments, compared to research benchmarks. For example, objects may appear different under various lighting conditions or wear out over time, and human-generated text often exhibits sociolinguistic variation. Additionally, malicious actors may exploit flaws in a system’s design to hijack it (e.g., in the form of an adversarial attack). The inability to handle the above situations may lead to negative consequences for safety (e.g., autonomous vehicle crashes) or fairness (e.g., linguistic discrimination against minority dialect speakers). Since ML systems sit at the intersection of statistics and software engineering, our definition encompasses two different definitions of robustness: the first relates to distributional robustness, where a method is resistant to deviations from the training data distribution; the second refers to the ability of a system to “function correctly in the presence of invalid inputs or stressful environmental conditions”.