11 canonical risk pages
Reliability
Failure modes that degrade output quality, consistency, or trustworthiness.
Brittleness
Tendency of models to suffer catastrophic failures when facing inputs slightly outside the training distribution, demonstrating a lack of robust generalization.
Catastrophic Forgetting
Drastic loss of previously learned knowledge when a neural network is trained on new tasks, especially problematic in continual learning.
Confabulated Hallucination
Generation of factually incorrect or fabricated information that the model presents with high apparent confidence, without basis in its training data or verifiable sources.
Model Collapse
Phenomenon in generative models where the model loses diversity in its outputs and converges to repeatedly generating a limited set of similar samples.
Model Drift
Progressive degradation of model performance when the real-world data distribution changes over time, differing from the original training data (Concept Drift).
Out-of-Distribution
Systematic failure of the model when encountering data that comes from a significantly different distribution than the training set.
Spurious Correlation
Learning of superficial statistical correlations without real causal relationship (e.g., associating snow with wolves because they appear together in photos), failing in generalization.
Sycophancy
Tendency of the model to produce responses that confirm the user's expectations or beliefs instead of providing objective and truthful information.
Underspecification
Ambiguity in the learning problem specification resulted in multiple models with similar test performance but radically different behavior in production.
Overfitting
Excessive learning of noise and specific details of the training set instead of generalizable patterns, resulting in poor performance on new data.
Underfitting
Model with insufficient capacity or inadequate training that fails to capture underlying patterns in the data, resulting in poor performance.