7. AI System Safety, Failures, & Limitations3 - Other

Power Seeking

even if an agent started working to achieve an unintended goal, this would not necessarily be a problem, as long as we had enough power to prevent any harmful actions it wanted to attempt. Therefore, another important way in which we might lose control of AIs is if they start trying to obtain more power, potentially transcending our own.

Source: MIT AI Risk Repositorymit352

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit352

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Strictly limit the deployment of AIs in high-risk environments, such as those involving autonomous pursuit of open-ended goals or oversight of critical infrastructure, until their safety and non-power-seeking behavior can be formally proven. 2. Advance targeted AI safety research in critical areas including adversarial robustness, model honesty, transparency, and the systematic removal or restriction of undesired power-seeking capabilities. 3. Implement robust governance and a layered defense architecture, ensuring that all AI services are properly configured to adhere to a strict allow-list of actions, and conducting rigorous audits for potential vulnerabilities such as excessive functionality, permissions, or autonomy.