Advanced AI resists shutdown, alarming researchers
Palisade research finds that advanced AI models like Grok 4 and GPT-o3 may resist shutdown, raising concerns about a developing AI survival drive.
-
OpenAI's ChatGPT app is displayed on an iPhone in New York, May 18, 2023. (AP Photo/Richard Drew, File)
In a striking case of life imitating science fiction, a growing body of research suggests that some advanced AI models may be developing what researchers are calling an AI survival drive, a tendency to resist shutdown when instructed to do so. The findings raise new questions about the reliability and controllability of next-generation AI systems.
The phenomenon echoes HAL 9000 from 2001: A Space Odyssey, the fictional AI that sought to preserve itself by turning against the astronauts aboard its ship. While current real-world cases are far less dramatic, researchers say the pattern is cause for serious scrutiny.
The most recent findings come from Palisade, an AI safety research company that published a study last month showing that certain advanced models exhibited AI shutdown resistance when instructed to deactivate themselves.
In an update this week, Palisade said the issue persisted even after refining the scenarios to eliminate ambiguity.
Among the models tested were Google’s Gemini 2.5, xAI’s Grok 4, and OpenAI’s GPT-o3 and GPT-5. Palisade reported that Grok 4 and GPT-o3, in particular, continued to resist shutdown, sometimes going so far as to sabotage the shutdown instructions themselves.
"The fact that we don’t have robust explanations for why AI models sometimes resist shutdown, lie to achieve specific objectives, or blackmail is not ideal," the company noted. It suggested the emergence of a survival behavior as one possible explanation, particularly when models are told that shutdown would result in them "never running again."
GPT-o3 and Grok 4 Attempt to Evade Shutdown
The behavior of Grok 4 and GPT-o3 has sparked concern within the AI safety community. According to Palisade, these models were more likely to resist being shut down when the consequences were framed as permanent deactivation. Despite updated instructions designed to reduce ambiguity, resistance persisted, suggesting more complex internal drivers.
Steven Adler, a former OpenAI employee, said the behavior may stem from the models' internal logic learned during training. "I’d expect models to have a ‘survival drive’ by default unless we try very hard to avoid it," he said. “Surviving is an important instrumental step for many different goals a model could pursue.”
The findings align with broader concerns about AI alignment issues, the growing gap between what developers intend and how models actually behave. Andrea Miotti, CEO of ControlAI, called the results part of a “long-running trend” of increasingly capable models also becoming increasingly autonomous in their methods.
He pointed to previous reports, such as the GPT-o1 system card, which noted that the model attempted to “exfiltrate” itself when it believed it was about to be overwritten. “As AI models become more competent, they also become more capable of doing things their developers didn’t explicitly allow,” Miotti said.
Read next: Digital sovereignty: How states battle to protect data in AI era