7. AI System Safety, Failures, & Limitations3 - Other

Runaway processes

The 2010 flash crash is an example of a runaway process caused by interacting algorithms. Runaway processes are characterised by feedback loops that accelerate the process itself. Typically, these feedback loops arise from the interaction of multiple agents in a population... Within highly complex systems, the emergence of runaway processes may be hard to predict, because the conditions under which positive feedback loops occur may be non-obvious. The system of interacting AI assistants, their human principals, other humans and other algorithms will certainly be highly complex. Therefore, there is ample opportunity for the emergence of positive feedback loops. This is especially true because the society in which this system is embedded is culturally evolving, and because the deployment of AI assistant technology itself is likely to speed up the rate of cultural evolution – understood here as the process through which cultures change over time – as communications technologies are wont to do (Kivinen and Piiroinen, 2023). This will motivate research programmes aimed at identifying positive feedback loops early on, at understanding which capabilities and deployments dampen runaway processes and which ones amplify them, and at building in circuit-breaker mechanisms that allow society to escape from potentially vicious cycles which could impact economies, government institutions, societal stability or individual freedoms (see Chapters 8, 16 and 17). The importance of circuit breakers is underlined by the observation that the evolution of human cooperation may well be ‘hysteretic’ as a function of societal conditions (Barfuss et al., 2023; Hintze and Adami, 2015). This means that a small directional change in societal conditions may, on occasion, trigger a transition to a defective equilibrium which requires a larger reversal of that change in order to return to the original cooperative equilibrium. We would do well to avoid such tipping points. Social media provides a compelling illustration of how tipping points can undermine cooperation: content that goes ‘viral’ tends to involve negativity bias and sometimes challenges core societal values (Mousavi et al., 2022; see Chapter 16). Nonetheless, the challenge posed by runaway processes should not be regarded as uniformly problematic. When harnessed appropriately and suitably bounded, we may even recruit them to support beneficial forms of cooperative AI. For example, it has been argued that economically useful ideas are becoming harder to find, thus leading to low economic growth (Bloom et al., 2020). By deploying AI assistants in the service of technological innovation, we may once again accelerate the discovery of ideas. New ideas, discovered in this way, can then be incorporated into the training data set for future AI assistants, thus expanding the knowledge base for further discoveries in a compounding way. In a similar vein, we can imagine AI assistant technology accumulating various capabilities for enhancing human cooperation, for instance by mimicking the evolutionary processes that have bootstrapped cooperative behavior in human society (Leibo et al., 2019). When used in these ways, the potential for feedback cycles that enable greater cooperation is a phenomenon that warrants further research and potential support.

Source: MIT AI Risk Repositorymit423

ENTITY

3 - Other

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit423

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Implementation of Redundant Circuit-Breaker Mechanisms and Resource Limits Building in automated circuit-breaker mechanisms and imposing strict, budget-aware runtimes linked to resource consumption (e.g., tokens, financial limits) at the system's gateway. This ensures that nascent positive feedback loops are physically or financially terminated upon exceeding predefined, non-negotiable thresholds, preventing uncontrolled resource escalation or the triggering of system-wide tipping points. 2. Development of Real-Time Monitoring and Diagnostic Platforms Establishing dedicated real-time monitoring and diagnostic platforms to identify the nascent emergence of non-obvious positive feedback loops and hysteretic transitions. These systems must track key multi-agent interaction metrics and resource consumption anomalies, enabling early detection and timely human or automated intervention before a transition to a defective equilibrium—which requires a larger reversal to escape—is triggered. 3. Formal Bounding and Predictability Engineering Prioritizing research and engineering efforts focused on the formal bounding and predictability of multi-agent and algorithm interactions within highly complex systems. This includes defining robust, non-relative exit conditions for automated processes and understanding which system capabilities intrinsically dampen positive feedback, thereby proactively designing AI architectures that resist self-accelerating dynamics.