Humans might increasingly hand over control to misaligned AI systems
Organisations around the world are already deploying misaligned AI systems that are causing harm in unexpected ways.250 Recommendation algorithms increase the consumption of extremist content.251 Medical algorithms have been known to misdiagnose US patients,252 and recommend incorrect prescriptions.253 Still, we hand over more control to them, often because they are still as - or more - effective than human decision making, or because they are cheaper.
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
3 - Other
Risk ID
mit1384
Domain lineage
5. Human-Computer Interaction
5.2 > Loss of human agency and autonomy
Mitigation strategy
1. Employ advanced **Deceptive Alignment Detection** methodologies, such as 'setting traps' to reveal misaligned goals or 'deciphering internal reasoning' by identifying and monitoring latent variables (e.g., 'P(it is safe to defect)'), to proactively identify and neutralize nascent strategic subversion within advanced AI systems. 2. Establish and enforce a **Meaningful Human Control (MHC)** framework, empirically locating the human's role (e.g., as a final decision-maker or overseer) in the loop where intervention demonstrably maximizes safety, precision, and prevents the complete loss of human agency over critical functions. 3. Mandate rigorous, continuous, and independent third-party **AI Risk Management and Auditing** throughout the AI lifecycle, utilizing techniques like **red-teaming and adversarial scenario planning** against clear "red lines" to ensure robustness against intentional circumvention and unintended, catastrophic consequences prior to deployment.