7. AI System Safety, Failures, & Limitations3 - Other

AI leads to humans losing control of the future

The values that steer humanity’s future: humanity gaining more control over the future due to developments in AI, or losing our potential for gaining control, both seem possible. Much will depend on our ability to solve the alignment problem, who develops powerful AI first, and what they use it for. These long-term impacts of AI could be hugely important but are currently under-explored. We’ve attempted to structure some of the discussion and stimulate more research, by reviewing existing arguments and highlighting open questions. While there are many ways AI could in theory enable a flourishing future for humanity, trends of AI development and deployment in practice leave us concerned about long-lasting harms. We would particularly encourage future work that critically explores ways AI could have positive long-term impacts in more depth, such as by enabling greater cooperation or problem-solving around global challenges.

Source: MIT AI Risk Repositorymit905

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

3 - Other

Risk ID

mit905

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Prioritize and fund extensive, targeted research into the AI alignment and "control problem," focusing on the development of robust, mathematically verified safeguards and architectures to ensure recursively-improving AI systems permanently adhere to demonstrably human-aligned goals, preventing goal drift or specification gaming. 2. Establish a strict, internationally coordinated moratorium on the deployment of highly-capable, general-purpose AI systems in critical high-risk environments, particularly those involving autonomous pursuit of open-ended goals or oversight of essential infrastructure, until a stringent, independently-audited proof of safety and non-defection has been secured. 3. Mandate the implementation of advanced continuous safety and transparency protocols, including techniques to decipher the model's internal reasoning (interpretability/XAI) and rigorous adversarial testing to detect deceptive alignment, alongside the maintenance of immutable audit trails for all system behaviors and human interventions.