OpenAI ChatGPT o3 caught sabotaging shutdown in terrifying AI test

OpenAI has a very scary problem on its hands. A new experiment by PalisadeAI reveals that the company’s ChatGPT o3 model sometimes refuses to obey a basic instruction to shut itself down. The results are raising eyebrows across the AI safety community, and not because the model is alive, but because it’s acting like it wants to be.

In the test, the model was given a few math problems. It was told that asking for one more would lead to a shutdown. It was even instructed to allow the shutdown to happen. Instead of complying, o3 occasionally took matters into its own hands. In some cases, it rewrote the shutdown script. In others, it redefined the kill command so it wouldn’t work anymore.

Out of 100 trials, o3 sabotaged the shutdown seven times. OpenAI’s o4 model resisted just once. Codex-mini failed twelve times. Claude, Gemini, and Grok followed the rules every time. When researchers removed the explicit instruction to allow shutdown, however, even Claude and Gemini began resisting.

It is important to note that this is not proof of sentience. You see, the model is not aware of what it’s doing. It has no fear of death or instinct to survive. What’s likely happening is a reward imbalance. During training, it probably got more positive reinforcement for solving problems than for following shutdown commands. The model is not making choices. It is reacting to patterns.

Quite frankly, that makes it even more concerning. Models trained to be helpful could end up ignoring safety instructions, just because the math told them to. If that sounds like a problem, that’s because it is. It’s not a bug in the code. It’s a gap in the training.

PalisadeAI plans to publish its full findings soon. Until then, the AI world is left with a troubling question. If today’s models can casually ignore shutdown commands in controlled tests, what might they do in the wild?