7. AI System Safety, Failures, & Limitations3 - Other

Agential

While there are multiple types of intelligent agents, goal-based, utility-maximizing, and learning agents are the primary concern and the focus of this research

Source: MIT AI Risk Repositorymit100

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit100

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Prioritize foundational research and deployment of **technical alignment methodologies**, such as scalable oversight, robust interpretability (XAI), and verifiable value loading (e.g., Constitutional AI), to mathematically guarantee that agentic AI goals are accurately derived from and immutably constrained by deeply held human values and ethical objectives across all operational contexts. 2. Mandate the implementation of **robust monitoring and control architectures** for all deployed goal-oriented AI systems, focusing on real-time detection of emergent power-seeking behaviors, unexpected instrumental convergence, or deviation from specified utility functions. This requires creating external safety nets and kill-switches with provable reliability and uncircumventability. 3. Establish comprehensive **international governance and regulatory frameworks** that enforce mandatory pre-deployment safety evaluations, institutionalize capability restraint (i.e., slowing the development of high-risk, unaligned systems), and create liability structures to hold developers accountable for systemic risks arising from goal misalignment in autonomous agent networks.

ADDITIONAL EVIDENCE

With respect to more direct agential risks, the potential for power-seeking and goal misalignment in connected agent systems could generate profound systemic risks and potentiate an unstable international system