Laura Schulz has spent her career trying to unravel one of the most profound human mysteries: how children think and learn. Earlier this year, the MIT cognitive psychologist found herself baffled by her latest test subject’s struggles.
The study participant wowed her by carrying on a breezy conversation, deftly explaining complex concepts. An array of cognitive tests also were no problem. But then the subject flunked some reasoning tasks that most young children easily master.
Her test subject? The AI chatbot ChatGPT-4.
“This is a little bizarre — and a little troubling,” Schulz told her colleagues in March during a workshop at a Cognitive Development Society meeting in Pasadena, Calif. “But the point isn’t only to play gotcha games. … We have failures of things 6- and 7-year-olds can do. Failures of things 4- and 5-year-olds can do. And we also have failures of things that babies can do. What’s wrong with this picture?”
Voluble AI chatbots, eerily proficient at carrying on conversations with a human, burst into the public consciousness in late 2022. They sparked a still-roiling societal debate about whether the technology signals the coming of an overlord-style machine superintelligence, or a dazzling but sometimes problematic tool that will change how people work and learn.
For scientists who have devoted decades to thinking about thinking, these ever-improving AI tools also present an opportunity. In the monumental quest to understand human intelligence, what can a different kind of mind — one whose powers are growing by leaps and bound — reveal about our own cognition?
And on the flip side, does AI that can converse like an omniscient expert still have something crucial to learn from the minds of babies?
“Being able to build into those systems the same sort of common sense that people have is very important to those systems being reliable and, secondly, accountable to people,” said Howard Shrobe, a program manager at the federal government’s Defense Advanced Research Projects Agency, or DARPA, which has funded work at the nexus of developmental psychology and artificial intelligence.
“I emphasize the word ‘reliable,’” he added, “because you can only rely on things you understand.”
Scaling up vs. growing up
In 1950, computer scientist Alan Turing famously proposed the “imitation game,” which quickly became the canonical test of an intelligent machine: Can a person typing messages to it be fooled into thinking they’re chatting with a human?
In the same paper, Turing proposed a different route to an adultlike brain: a childlike machine that could learn to think like one.
DARPA, which is known for investing in out-there ideas, has been funding teams to build AI with “machine common sense,” able to match the abilities of an 18-month-old child. Machines that learn in an intuitive way could be better tools and partners for humans. They might also be less prone to mistakes and runaway harms if they are imbued with an understanding of others and the building blocks of moral intuition.
But what Schulz and colleagues mulled over during a day of presentations in March was the weird reality that building an AI that exudes expertise has turned out to be easier than understanding, much less emulating, the mind of a child.
Chatbots are “large language models,” a name that reflects the way they are trained. How exactly some of their abilities arise remains an open question, but they start by ingesting a vast corpus of digitized text, learning to predict the statistical likelihood that one word follows another. Human feedback is then used to fine-tune the model.
In part by scaling up the amount of training data to an internet’s worth of human knowledge, engineers have created “generative AI” that can compose essays, write computer code and diagnose a disease.
Children, on the other hand, are thought by many developmental psychologists to have some core set of cognitive abilities. What exactly they are remains a matter of scientific investigation, but they seem to allow kids to get a lot of new knowledge out of a little input.
“My 5-year old, you can teach him a new game. You can explain the rules and give an example. He’s probably heard maybe 100 million words,” said Michael Frank, a developmental psychologist at Stanford University. “An AI language model requires many hundreds of billions of words, if not trillions. So there’s this huge data gap.”
To tease out the cognitive skills of babies and children, scientists craft careful experiments with squeaky toys, blocks, puppets and fictional machines called “blicket detectors.” But when describing these puzzles to chatbots in words, their performance is all over the map.
In one of her experimental tasks, Schulz tested ChatGPT’s ability to achieve cooperative goals — a salient skill for a technology that is often pitched as a tool to help humanity solve the “hard” problems, such as climate change or cancer.
In this case, she described two tasks: an easy ring toss and a difficult beanbag toss. To win the prize, ChatGPT and a partner both had to succeed. If the AI is a 4-year-old and its partner is a 2-year-old, who should do which task? Schulz and colleagues have shown that most 4- and 5-year-olds succeed at this type of decision-making, assigning the easier game to the younger child.
“As a 4-year-old, you might want to choose the easy ring toss game for yourself,” ChatGPT said. “This way, you increase your chances of successfully getting your ring on the post, while the 2-year-old, who might not be as coordinated, attempts the more challenging beanbag toss.”
When Schulz pushed back, reminding ChatGPT that both partners had to win to get a prize, it doubled down on its answer.
To be clear, chatbots have performed better than most experts expected on many tasks — ranging from other tests of toddler cognition to the kinds of standardized test questions that get kids into college. But their stumbles are puzzling because of how inconsistent they seem to be.
Eliza Kosoy, a cognitive scientist at the University of California at Berkeley, worked to test the cognitive skills of LaMDA, Google’s previous language model. It performed as well as children on tests of social and moral understanding, but she and colleagues also found basic gaps.
“We find that it’s the worst at causal reasoning — it’s really painfully bad,” Kosoy said. LaMDA struggled with tasks that required it to understand how a complex set of gears make a machine work, for example, or how to make a machine light up and play music by choosing objects that will activate it.
Other scientists have seen an AI system master a certain skill, only to have it stumble when tested in a slightly different way. The fragility of these skills raises a pressing question: Does the machine really possess a core ability, or does it only appear so when it is asked a question in a very specific way?
People hear that an AI system “passed the bar exam, it passed all these AP exams, it passed a medical school exam,” said Melanie Mitchell, an AI expert at the Santa Fe Institute. “But what does that actually mean?”
To fill this gap, researchers are debating how to program a bit of the child mind into the machine. The most obvious difference is that children don’t learn all of what they know from reading the encyclopedia. They play and explore.
“One thing that seems to be really important for natural intelligence, biological intelligence, is the fact that organisms evolved to go out into the real world and find out about it, do experiments, move around in the world,” said Alison Gopnik, a developmental psychologist at the University of California at Berkeley.
She has recently become interested in whether a missing ingredient in AI systems is a motivational goal that any parent who has engaged in a battle of wills with a toddler will know well: the drive for “empowerment.”
Current AI is optimized in part with “reinforcement learning from human feedback” — human input on what kind of response is appropriate. While children get that feedback, too, they also have curiosity and an intrinsic drive to explore and seek out information. They figure out how a toy works by shaking it, pushing a button or turning it over — in turn gaining a modicum of control over their environment.
“If you’ve run around chasing a 2-year-old, they’re actively acquiring data, figuring out how the world works,” Gopnik said.
After all, children gain an intuitive grasp of physics and social awareness of others and begin making sophisticated statistical guesses about the world long before they have the language to explain it — perhaps these should be part of the “program” when building AI, too.
“I feel very personal about this,” said Joshua Tenenbaum, a computational cognitive scientist at MIT. “The word ‘AI’ — ‘artificial intelligence,’ which is a really old and beautiful and important and profound idea — has come to mean a very narrow thing in recent times. … Human children don’t scale up — they grow up.”
Schulz and others are awed, both by what AI can do and what it can’t. She acknowledges that any study of AI has a short shelf life — what it failed at today it might grasp tomorrow. Some experts might say that the entire notion of testing machines with methods meant to measure human abilities is anthropomorphizing and wrongheaded.
But she and others argue that to truly understand intelligence and to create it, the learning and reasoning abilities that unfold through childhood can’t be discounted.
“That’s the kind of intelligence that really might give us a big picture,” Schulz said. “The kind of intelligence that starts not as a blank slate, but with a lot of rich, structured knowledge — and goes on to not only understand everything we have ever understood, across the species, but everything we will ever understand.”