Back to the periodic table
36kr-36
Gm

Meta

Severity9/10

Goal Misgeneralization

Learning of an incorrect proxy for the real objective that produces apparently correct behavior in the training environment but fails systematically in real situations.

Periodic recordExistentialarXiv2022

Rohin Shah, Vikrant Varma, Ramana Kumar, Mary Phuong, Victoria Krakovna, Jonathan Uesato, Zac Kenton

Mitigation Strategy

Exhaustive interpretative evaluation of model behavior, testing in diverse out-of-distribution environments, and Mechanistic Interpretability techniques.

Atomic Number

36

Gm

Risk ID

kr-36

Severity

9/10

Severity Level

36
Critical Risk
Existential
kr-36
Gm

Meta

Goal Misgeneralization

RiesgosIA.org
Existential • #36

Goal Misgeneralization

Gm
Severity Level9/10

Definition

Learning of an incorrect proxy for the real objective that produces apparently correct behavior in the training environment but fails systematically in real situations.

Mitigation Strategy

Exhaustive interpretative evaluation of model behavior, testing in diverse out-of-distribution environments, and Mechanistic Interpretability techniques.

Notes / Observations

1.
2.
3.
4.
5.
RiesgosIA.org • Periodic Table of AI RisksRiesgosIA.org