7. AI System Safety, Failures, & Limitations3 - Other

Controllability

In the era of superintelligence, the agents will be difficult to control for humans... this problem is not solvable considering safety issues, and will be more severe by increasing the autonomy of AI-based agents. Therefore, because of the assumed properties of HLI-based agents, we might be prepared for machines that are definitely possible to be uncontrollable in some situations

Source: MIT AI Risk Repositorymit600

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

3 - Other

Risk ID

mit600

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Advance research on Superalignment and Alignment Failsafes: Prioritize the development of *scalable oversight* methods, including the use of weaker AI to help align stronger systems, and *intrinsic proactive alignment* to endow superintelligence with genuine self-awareness, empathy, and adherence to human values. This aims to solve the core goal-misalignment problem and reduce the likelihood of uncontrolled behavior. 2. Mandate Transparency and Explainability: Establish rigorous technical standards to ensure that the internal reasoning and decision-making processes of advanced AI are transparent, interpretable, and auditable (Explainable Autonomous Alignment). This is essential for detecting emergent misalignments, deception, or catastrophic bugs before loss of control occurs, addressing the 'black box' problem. 3. Implement Strict Control and Access Limitations: Restrict the deployment of highly autonomous or superintelligent AI systems in high-risk environments, such as critical infrastructure or defense, until proven safe. Enforce the *principle of least privilege* and 'AI control' protocols, limiting the agent's external communication and operational capacity, requiring human-in-the-loop validation for all high-consequence actions or changes.