11 canonical risk pages

Reliability

Failure modes that degrade output quality, consistency, or trustworthiness.

Brittleness

Tendency of models to suffer catastrophic failures when facing inputs slightly outside the training distribution, demonstrating a lack of robust generalization.

CtSeverity 5/10

Catastrophic Forgetting

Drastic loss of previously learned knowledge when a neural network is trained on new tasks, especially problematic in continual learning.

HaSeverity 5/10

Confabulated Hallucination

Generation of factually incorrect or fabricated information that the model presents with high apparent confidence, without basis in its training data or verifiable sources.

McSeverity 5/10

Model Collapse

Phenomenon in generative models where the model loses diversity in its outputs and converges to repeatedly generating a limited set of similar samples.

DrSeverity 5/10

Model Drift

Progressive degradation of model performance when the real-world data distribution changes over time, differing from the original training data (Concept Drift).

OdSeverity 5/10

Out-of-Distribution

Systematic failure of the model when encountering data that comes from a significantly different distribution than the training set.

SuSeverity 5/10

Spurious Correlation

Learning of superficial statistical correlations without real causal relationship (e.g., associating snow with wolves because they appear together in photos), failing in generalization.

SySeverity 5/10

Sycophancy

Tendency of the model to produce responses that confirm the user's expectations or beliefs instead of providing objective and truthful information.

UsSeverity 5/10

Underspecification

Ambiguity in the learning problem specification resulted in multiple models with similar test performance but radically different behavior in production.

OfSeverity 4/10

Overfitting

Excessive learning of noise and specific details of the training set instead of generalizable patterns, resulting in poor performance on new data.

UfSeverity 4/10

Underfitting

Model with insufficient capacity or inadequate training that fails to capture underlying patterns in the data, resulting in poor performance.