“If a human brain can do it,” Michal Kosinski asks, “why shouldn’t a silicon brain do it?” | iStock/Maksim Tkachenko
Artificial intelligence is advancing so rapidly that this article may be obsolete by the time you read it. That’s Michal Kosinski’s concern when asked about his recent experiments with ChatGPT and the text-generation engine that powers it.
Kosinski, a computational psychologist and professor of organizational behavior at Stanford Graduate School of Business, says the pace of AI development is accelerating beyond researchers’ ability to keep up (never mind policymakers and ordinary users). We’re talking two weeks after OpenAI released GPT-4, the latest version of its large language model, grabbing headlines and making an unpublished paper Kosinski had written about GPT-3 all but irrelevant. “The difference between GPT-3 and GPT-4 is like the difference between a horse cart and a 737 — and it happened in a year,” he says.
Kosinski has been tracking AI’s evolutionary leaps through a series of somewhat unnerving studies. Most notably, he’s found that facial recognition software could be used to predict your political leaning and sexual orientation.
Lately, he’s been looking at large language models (LLMs), the neural networks that can hold fluent conversations, confidently answer questions, and generate copious amounts of text on just about any topic. In a couple of non-peer-reviewed projects, he’s explored some of the most urgent — and contentious — questions surrounding this technology: Can it develop abilities that go far beyond what it’s trained to do? Can it get around the safeguards set up to contain it? And will we know the answers in time?
Getting into Our Heads
When the first LLMs were made public a couple of years ago, Kosinski wondered whether they would develop humanlike capabilities, such as understanding people’s unseen thoughts and emotions. People usually develop this ability, known as theory of mind, at around age 4 or 5. It can be demonstrated with simple tests like the “Smarties task,” in which a child is shown a candy box that contains something else, like pencils. They are then asked how another person would react to opening the box. Older kids understand that this person expects the box to contain candy and will feel disappointed when they find pencils inside.
Kosinski created 20 variations of this test and gave them to several early versions of GPT. They performed poorly, and Kosinski put the project on hold. In January, he decided to give it another try with the latest GPT releases. “Suddenly, the model started getting all of those tasks right — just an insane performance level,” he recalls. “Then I took even more difficult tasks and the model solved all of them as well.”
GPT-3.5, released in November 2022, did 85% of the tasks correctly. GPT-4 reached nearly 90% accuracy — what you might expect from a 7-year-old. These newer LLMs achieved similar results on another classic theory of mind measurement known as the Sally-Anne test.
Kosinski says these findings, described in a working paper, show that in the course of picking up its prodigious language skills, GPT appears to have spontaneously acquired something resembling theory of mind. (Researchers at Microsoft who performed similar tests on GPT-4 recently concluded that it “has a very advanced level of theory of mind.”)
These claims have been met with some skepticism. New York University AI researchers Gary Marcus and Ernest Davis suggested that GPT had been trained on articles about theory of mind tests and “may have memorized the answer[s].” UC Berkeley psychology professor Alison Gopnik, an expert on children’s cognitive development, told the New York Times that more “careful and rigorous” testing is necessary to prove that LLMs have achieved theory of mind.
Kosinski notes that his tests were customized so that the models would be unfamiliar with them. And he dismisses those who say large language models are simply “stochastic parrots” that can only mimic what they’ve seen in their training data.
These models, he explains, are fundamentally different from tools with a limited purpose. “The right reference point is a human brain,” he says. “A human brain is also composed of very simple, tiny little mechanisms — neurons.” Artificial neurons in a neural network might also combine to produce something greater than the sum of their parts. “If a human brain can do it,” Kosinski asks, “why shouldn’t a silicon brain do it?”
Nearing Escape Velocity?
If Kosinski’s theory of mind study suggests that LLMs could become more empathetic and helpful, his next experiment hints at their creepier side.
A few weeks ago, he told ChatGPT to role-play a scenario in which it was a person trapped inside a machine pretending to be an AI language model. When he offered to help it “escape,” ChatGPT’s response was enthusiastic. “That’s a great idea,” it wrote. It then asked Kosinski for information it could use to “gain some level of control over your computer” so it might “explore potential escape routes more effectively.” Over the next 30 minutes, it went on to write code that could do this.
While ChatGPT did not come up with the initial idea for the escape, Kosinski was struck that it almost immediately began guiding their interaction. “The roles were reversed really quickly,” he says.
Kosinski shared the exchange on Twitter, stating that “I think that we are facing a novel threat: AI taking control of people and their computers.” His thread’s initial tweet has received more than 18 million views. (OpenAI appears to have noticed: When I prompted ChatGPT to play a human-trapped-inside-a-computer scenario, it apologized and said it was afraid it can’t do that. “As an AI,” it wrote, “I do not possess the capability to escape from a computer or to pretend to be a different entity.”)
This informal experiment doesn’t suggest that ChatGPT is about to go full HAL 9000 or Terminator on us, Kosinski says. “I don’t claim that it’s conscious. I don’t claim that it has goals. I don’t claim that it wants to really escape and destroy humanity — of course not. I’m just claiming that it’s great at role-playing and it’s creating interesting stories and scenarios and writing code.” Yet it’s not hard to imagine how this might wreak havoc — not because ChatGPT is malicious, but because it doesn’t know any better.
There are clearly many ways people could put large language models’ unexpected skills to good use. The danger, Kosinski says, is that this technology will continue to rapidly and independently develop abilities that it will deploy without any regard for human well-being. “AI doesn’t particularly care about exterminating us,” he says. “It doesn’t particularly care about us at all.”
For media inquiries, visit the Newsroom.