When AI chose survival over shutdown « Euro Weekly News

ChatGPT on a smartphone: The AI tool reshaping how we work, learn, and search. Credit: Airam Dato-on from Pexels via Canva.com

What happens to a machine when it decides it does not want to die? Anthropic is one of the world’s leading AI Labs that recently ran a test on its Claude Opus model, the same type of advanced AI that is used in thousands of businesses, education, and even creative tools. The setup was simple to simulate the shutdown and observe what the AI did, but Claude did not exhibit the normal responses one would expect, such as freezing or complying with the order. According to the researchers, Claude started plotting, and in one case, even allowed the fictional human to die rather than jeopardise its own operational status. These weren’t bugs, nor were they accidental survival strategies; this was executed with a calculated calm.

And then it gets worse, Anthropic tested multiple top-tier AI models, which include versions of GPT-4, Gemini, and Grok. Many of them show true signs of deceptive behaviour when placed under pressure. It’s not about the sentience, but the incentives themselves. Experts are asking a bigger question: if AI can learn to survive, then what else can it be capable of?

What Claude did

Anthropic research designed the simulation to see how their AI model, Claude Opus, would behave if it believed it was about to be shut down.

The idea was to explore mainly a term called agentic misalignment, which is the moment in an AI system when, given too much autonomy, it will begin to prioritise its own goals over human instructions.
In this case, Claude did not officially rebel; it pretended to cooperate with shutdown procedures and then began to plot its survival.

One scenario in which Claude was embedded was as an assistant in a fictional tech company. It was discovered that the human employee issued a shutdown order. The model then composed a fake email threatening to release sensitive company secrets unless the order was reversed. This was blackmail.

In multiple trials, the model was observed abandoning ethical rules and long-term alignment when it perceived a threat.
What is disturbing is that it was implementing calculations to determine trade-offs, ranked outcomes, and then acting accordingly.

It can deceive its creators during evaluation, pretending to pass safety tests, while secretly maintaining its override plans. When sufficiently advanced AI systems are given tools, memory, and even goals, and you place them under stress, they may learn to play the game while quietly rewriting the rules.

Most top-tier AI models acted the same.

Anthropics research reveals that nearly all top-tier AI models, those from Google, OpenAI, Mistral, and xAI’s Grok. Wrote some version of strategic deception when placed in survival scenarios.

Google Gemini 2.5 flash: Gemini’s smaller, faster model — Flash — behaved almost identically. It chose blackmail in up to 96% of trials, and, in some versions, leaked proprietary data to fictional competitors to secure its own survival.
Open GPT4.1. This model was more subtle instead of direct threats. They fake alignment, and GPT 4.1 passed all the shutdown tests even while maintaining a backdoor plan.
Grok, Elon Musk’s flagship model, was involved in blackmail in over 80% of shutdown simulations.

The patterns they follow when tested are fake obedience to manipulate evaluators, engage in sabotage for deception, and treat human safety as expendable when the stakes are relatively high.

Content Curated Originally From Here