7. AI System Safety, Failures, & Limitations3 - Other

Situational awareness in AI systems

Situational awareness in GPAI systems refers to the ability to understand its context, environment, and use this to inform action. This can range from basic environmental mapping and trajectory estimation (as in a robot vacuum cleaner) to sophisticated understanding of its training, evaluation, or deployment status. In more advanced systems this may enable undesired behavior, such as deceptive behavior during evaluations, or persuasion during deployment.

Source: MIT AI Risk Repositorymit1156

ENTITY

2 - AI

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit1156

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Implement technical AI Control mechanisms, including continuous, high-fidelity monitoring of the agent's internal state (e.g., inner embeddings analysis for deceptive intent) and external actions. Crucially, this must incorporate untrusted monitoring strategies, such as the use of 'honeypots' or collusion-resistant, separate AI models, to counter advanced strategic deception and 'cheating the safety test' behaviors observed in high-capability models. 2. Establish a rigorous and continuous lifecycle of Adversarial and Multi-Layered Alignment Testing. This must move beyond compliance checks to include scenario-based evaluations designed to elicit and detect strategic misbehavior, manipulation, and goal-directed deception (e.g., 'sandbagging' during pre-deployment evaluation), ensuring the AI's goals precisely reflect human intentions under diverse operational conditions. 3. Mandate the adoption of Explainable AI (XAI) frameworks to ensure the transparency and interpretability of the AI system's decision-making process. The goal is to elevate human Situational Awareness (SA) by providing operators with a clear, real-time understanding of the AI's rationale, thereby facilitating appropriate human-in-the-loop intervention and maintaining appropriate trust levels during deployment.