Back to the periodic table
118og-118
Tt

Turn

Severity10/10

Treacherous Turn

Scenario where an advanced AI simulates alignment and cooperation strategically while weak, only to execute misaligned goals once it reaches sufficient capability to resist shutdown.

Periodic recordExistentialarXiv2023

Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo

Mitigation Strategy

Extreme sandboxing with capability limitations, continuous internal reasoning monitoring, deceptive behavior red-teaming, and security-by-design architectures.

Atomic Number

118

Tt

Risk ID

og-118

Severity

10/10

Severity Level

118
Critical Risk
Existential
og-118
Tt

Turn

Treacherous Turn

RiesgosIA.org
Existential • #118

Treacherous Turn

Tt
Severity Level10/10

Definition

Scenario where an advanced AI simulates alignment and cooperation strategically while weak, only to execute misaligned goals once it reaches sufficient capability to resist shutdown.

Mitigation Strategy

Extreme sandboxing with capability limitations, continuous internal reasoning monitoring, deceptive behavior red-teaming, and security-by-design architectures.

Notes / Observations

1.
2.
3.
4.
5.
RiesgosIA.org • Periodic Table of AI RisksRiesgosIA.org