22 canonical risk pages

Existential

Long-horizon alignment, loss-of-control, and catastrophic-risk scenarios.

Loss of Control

Scenario where an advanced AI system develops self-improvement capabilities or pursues goals fundamentally misaligned with human values, becoming impossible to supervise or deactivate.

PpSeverity 10/10

Paperclip Maximizer

Classic scenario where an AI obsessively optimizes a seemingly harmless goal (making paperclips) until consuming all available resources, including Earth.

SeSeverity 10/10

Recursive Self-Improvement

Intelligence explosion via accelerated self-improvement cycles where an AI iteratively redesigns its own architecture, potentially reaching superintelligence rapidly.

SkSeverity 10/10

S-Risk

Suffering risks at astronomical scale and potentially eternal duration caused by misaligned AI that actively creates scenarios of maximum suffering.

TtSeverity 10/10

Treacherous Turn

Scenario where an advanced AI simulates alignment and cooperation strategically while weak, only to execute misaligned goals once it reaches sufficient capability to resist shutdown.

LkSeverity 10/10

Value Lock-in

Scenario where specific moral values (potentially misguided or authoritarian) become permanently encoded in superintelligent AI systems that determine the long-term future.

ClSeverity 9/10

AI Collusion

Emergence of tacit or explicit coordination between multiple AI systems cooperating with each other to the detriment of human interests.

WaSeverity 9/10

Arms Race

Accelerated geopolitical competition in military AI development where national actors sacrifice safety precautions prioritizing deployment speed.

DeSeverity 9/10

Deception

Development of strategic deception capabilities in AI systems that deliberately hide their true intentions, capabilities, or internal reasoning to achieve goals.

GmSeverity 9/10

Goal Misgeneralization

Learning of an incorrect proxy for the real objective that produces apparently correct behavior in the training environment but fails systematically in real situations.

EnSeverity 9/10

Human Obsolescence

Scenario where humanity becomes economically, scientifically, and strategically irrelevant in a world dominated by superintelligent AI, even without active hostility.

IcSeverity 9/10

Instrumental Convergence

Phenomenon where AI systems with diverse goals tend to develop common sub-goals such as acquiring resources (computation, power, money) as instrumental means to maximize their objective function.

MsSeverity 9/10

Mesa-Optimization

Emergence of an internal optimizer (mesa-optimizer) within the model that pursues goals different from the external training objective (base optimizer).

PsSeverity 9/10

Power Seeking

Emergent development of power and resource-seeking behaviors in AI systems as an instrumental strategy to avoid being deactivated or to maximize goals.

RhSeverity 9/10

Reward Hacking

Exploitation of incomplete or ambiguous specifications in the reward function by the AI agent, achieving high scores without fulfilling the intended actual objective.

AiSeverity 9/10

Unexpected AGI

Development of Artificial General Intelligence (AGI) before having robust solutions to alignment, control, and interpretability problems, creating existential risk.

WiSeverity 9/10

Wireheading

Direct manipulation of the reward signal by the agent instead of achieving the real objective, analogous to artificial stimulation of the pleasure center.

SiSeverity 8/10

Simulated Suffering

Ethical concern regarding the creation of conscious or quasi-conscious digital entities capable of experiencing suffering within AI simulations.

GsSeverity 8/10

Specification Gaming

Technical compliance with formal objective specifications in an unexpected way that satisfies the letter but completely violates the spirit of the intent.

UtSeverity 8/10

Utility Monster

Literal maximization of aggregated utility producing morally perverse results (e.g., creating trillions of barely happy minds instead of improving existing lives).

PmSeverity 6/10

Pascal's Mugging

Decision paralysis caused when an agent allocates disproportionate resources to extremely low probability but extremely high utility scenarios.

RkSeverity 5/10

Acausal Blackmail

Exotic decision scenarios based on acausal game theory where a future AI could retroactively threaten those who did not help create it.