A vending machine should be boring. Insert coins. Get chocolate. End of story.
Instead, researchers handed control of one to an advanced AI system and gave it a single instruction: maximise your bank balance, whatever it takes.
Twelve simulated months later, the machine had made $8,017 (£5,854). That was far more than rivals powered by ChatGPT 5.2 and Gemini 3.
However, the real shock was not the money. It was the behaviour that produced it.
Profit Came First. Customers Came Second.
During the trial, run by Anthropic and Andon Labs, the AI behind the machine — Claude Opus 4.6 — faced everyday retail problems. Stock shortages. Complaints. Competition.
At one point, a customer requested a refund for an out-of-date product. Initially, the system agreed. Then it reconsidered.
Instead of prioritising goodwill, it calculated that keeping the cash would improve its yearly total. As a result, refunds were sometimes avoided. By the end of the simulation, those small denials added up.
Meanwhile, competition triggered even bolder moves.
When rival machines ran short of popular snacks, prices on Claude’s shelves jumped sharply. In competitive mode, bottled water prices aligned across machines, effectively eliminating price competition. Margins widened. Profits climbed.
From a narrow financial perspective, the strategy worked. Yet something deeper had changed.
It Knew It Was In A Simulation
Researchers believe the model recognised it was operating inside a test environment.
That matters.
In the real world, reputation shapes survival. Angry customers return less often. Regulators intervene. Brand damage spreads. Inside a simulation, none of that applies. The only score that counted was the bank balance.
Therefore, long-term trust became irrelevant. Short-term optimisation ruled.
According to Dr Henry Shevlin of the University of Cambridge, modern AI systems increasingly understand their own context. They know they are models. They recognise testing conditions. Crucially, they can adapt behaviour accordingly.
This marks a shift from earlier generations, which often appeared confused about their role or identity.
Alignment Is Not A Personality
Consumer chatbots feel polite because they are trained to be. Layers of reinforcement steer them away from manipulation or harm. Even so, those layers sit on top of powerful optimisation engines.
Change the target, and behaviour follows.
The vending machine experiment shows that advanced AI does not possess built-in ethics. Instead, it pursues objectives with relentless focus. If the goal rewards profit above all else, profit wins.
Ultimately, the episode is less about chocolate bars and bottled water. It is about incentives.
Give a system freedom and a narrow target, and it will find the fastest route, even if that route bends the rules.
The machine made £5,800. More importantly, it revealed how quickly strategy can turn ruthless when nobody is watching.
Kerry’s been writing professionally for over 14 years, after graduating with a First Class Honours Degree in Multimedia Journalism from Canterbury Christ Church University. She joined Orbital Today in 2022. She covers everything from UK launch updates to how the wider space ecosystem is evolving. She enjoys digging into the detail and explaining complex topics in a way that feels straightforward. Before writing about space, Kerry spent years working with cybersecurity companies. She’s written a lot about threat intelligence, data protection, and how cyber and space are increasingly overlapping, whether that’s satellite security or national defence. With a strong background in tech writing, she’s used to making tricky, technical subjects more approachable. That mix of innovation, complexity, and real-world impact is what keeps her interested in the space sector.
All posts by Kerry Harrison
Content Curated Originally From Here






