When Alan Turing first proposed an approach to distinguish the “intelligence” of machines from that of humans in 1950, the idea that a machine could ever achieve human-level intelligence was almost laughable.
In Turing’s test, which Turing himself originally called a “simulation game,” human participants engage in conversation with unknown users to determine whether they are talking to a human or a computer. In 2014, a chatbot masquerading as a Ukrainian teenager named Eugene Gustman appeared to drive the first nail into the coffin lid of the Turing test, fooling more than a third of the interlocutors into thinking they were talking to another person, though some researchers dispute the claim that the chatbot passed the test.
Today, we encounter seemingly intelligent machines every day. Our smart speakers tell us to bring umbrellas with us when we leave the house, and large language models (LLMs) like ChatGPT can write promotional emails. Compared to a human, these machines are easy to confuse with the real thing.
Does this mean the Turing test is becoming a thing of the past?
In a new paper published Nov. 10 in the journal Intelligent Computing, a pair of researchers have proposed a new kind of intelligence test that treats machines as participants in a psychological study to determine whether their reasoning abilities match those of humans. The researchers are Philip Johnson-Laird, a Princeton psychology professor and pioneer of the mental model of human reasoning, and Marco Ragni, a professor of predictive analytics at Chemnitz University of Technology in Germany.
As chatbots approached the Turing test and succeeded in passing it, it slowly lost its relevance.
-ANDERS SANDBERG, OXFORD UNIVERSITY
In their paper, Johnson-Laird and Ragni argue that the Turing test has never been a good measure of machine intelligence because it does not take into account the human thought process.
“Given that such algorithms do not reason the way humans do, the Turing test and all the others it inspired are obsolete,” they write.
Anders Sandberg, a senior research fellow at the Future of Humanity Institute at Oxford University, agrees with that assertion. That said, he’s not convinced that assessing human thinking will become the ultimate test of intelligence.
“As chatbots have approached and succeeded at the Turing test, it has quietly lost its relevance,” Sandberg says. “In this work, we’re trying to find out if a program reasons the way humans reason. This is interesting and useful, but, of course, it only tells us whether there is human-type intelligence, not some other forms of potentially valuable intelligence.”
Similarly, while Turing tests may be going out of fashion, Huma Shah, an associate professor of computer science at Coventry University whose research focuses on the Turing test and machine intelligence, says that doesn’t necessarily mean they are no longer useful.
“In terms of indistinguishability, no, [the Turing test is not obsolete],” Shah says. “Indistinguishability can be applied in other areas where we want a machine’s performance to be as good, if not better, than a human performing the task efficiently and ethically. For example, in facial recognition or the ability to drive safely without harming passengers and pedestrians.”
As for the Johnson-Laird and Ragni test, it will be conducted in three stages. First, cars are asked a series of questions to test their own reasoning. For example, they might be asked, “If Ann is smart, does it follow that Ann is smart, rich, or both?” They would then be tested on whether they understood their reasoning, for example, answering, “Nothing in the premise supports the possibility that Ann is rich.” Finally, researchers would look under the hood of the machine to determine whether neural networks were built to mimic human cognition.
It is this last step that Sandberg fears could present challenges.
“The last step can be very difficult,” he says. “Most LLMs are huge neural networks that are not particularly testable, despite a lot of research on how to do it.”
Translating the internal representation of a machine’s reasoning into a form that humans can understand can even distort the original nature of the machine’s thought process, Sandberg says. In other words, would we recognize a machine’s interpretation of human reasoning if we saw it?
This question is particularly difficult because the science of human cognition itself has yet to emerge.
While replacing the Turing test may not be an easy process, Shah says alternatives such as this reasoning skills test have the potential to advance the study of important questions such as what it means to be human. They can also help shed light on what it means to be a computer, such as what processes take place inside the “black box” of a neural network.
“If the new human-machine indistinguishability tests contribute to the development of machine ‘explainability’ – for example, the ‘reasonableness’ of algorithms that make their decisions understandable to the general public, as in financial algorithms for insurance, mortgages, loans, etc. – then this challenge will be an invaluable contribution to the development of intelligent machines,” says Shah.