Exploring the Cost of Thinking in AI and Humans
Large language models (LLMs), such as ChatGPT, have revolutionized text generation, essay writing, and even meal planning. However, their ability to tackle complex reasoning tasks has long been a point of contention. New developments in reasoning models — an advanced type of LLM — are beginning to close that gap, and researchers at MIT’s McGovern Institute for Brain Research have uncovered striking parallels between how these models and humans process difficult problems.
In a study published in the Proceedings of the National Academy of Sciences (PNAS), MIT neuroscientists reveal that the “cost of thinking” — the cognitive or computational effort needed to solve a problem — is remarkably similar in both humans and reasoning models. The research, led by Associate Professor Evelina Fedorenko, underscores an unexpected convergence in problem-solving approaches between artificial and biological intelligence.
Not Designed to Think Like Humans — But They Do
“People who build these models don’t necessarily aim to replicate human thought processes,” Fedorenko explains. “They just want systems that can robustly solve problems. The fact that there’s convergence is really quite striking.”
These reasoning models are a type of artificial neural network, trained with data and problem-solving tasks to improve performance. Historically, such models have excelled in perception and language, but many researchers doubted their capacity for higher cognitive functions. Fedorenko herself once believed that reasoning was beyond the reach of neural networks — until the emergence of new models capable of solving math problems and writing code.
Step-by-Step Problem Solving
Andrea Gregor de Varda, a postdoctoral fellow at the K. Lisa Yang ICoN Center and a researcher in Fedorenko’s lab, explains that the key to the models’ success lies in their ability to break down complex problems into smaller components. “At some point, it became clear that these models needed more space to perform computations,” he says. “Their performance improved significantly when allowed to process problems step by step.”
These models are trained using reinforcement learning, a method where correct answers are rewarded and incorrect ones are penalized. Over time, this encourages the model to adopt strategies that consistently yield accurate results. Although they may take longer to produce answers, the trade-off is improved accuracy — a trait shared with human cognition.
Comparing Effort: Time vs. Tokens
To explore the similarities between human and model cognition, de Varda conducted experiments comparing how humans and reasoning models approached the same set of seven problem types, including numeric arithmetic and intuitive reasoning. Human participants’ response times were recorded down to the millisecond. For models, the measure of effort wasn’t time, which can vary based on hardware, but rather tokens — units of internal computation generated during problem-solving.
“Tokens are like a model’s internal monologue,” de Varda says. “They aren’t meant for the user to see, but they reflect the internal thought process, similar to how humans talk to themselves when thinking through a problem.”
The results were revealing: the more time a human needed to solve a problem, the more tokens the model used. In both cases, arithmetic problems were relatively easy, while complex pattern recognition tasks, such as those in the ARC (Abstraction and Reasoning Corpus) challenge, were the most demanding.
A Shared Cognitive Burden
This correlation between time and tokens indicates that both humans and models experience a similar cognitive load when solving the same types of problems. “That match in the cost of thinking demonstrates a real similarity in how both systems handle complex reasoning,” Fedorenko notes.
However, she cautions against interpreting this as evidence that models truly think like humans. The internal computations of these models occur in abstract, non-linguistic spaces — much like how humans often think without words. “Even when the model’s output includes some errors or nonsensical phrases, it can still arrive at the correct answer. That suggests the underlying computations are happening in a different form,” she adds.
What Lies Ahead?
Researchers are keen to explore whether these models use representations of information similar to those found in the human brain. They also aim to assess how these representations are transformed into solutions and whether the models can handle reasoning tasks that require real-world knowledge not explicitly included in their training data.
Despite their growing capabilities, reasoning models are not yet recreations of human cognition. Still, the fact that they mirror human-like behavior in specific cognitive tasks is a noteworthy milestone in artificial intelligence development.
The next wave of research may further bridge neuroscience and AI, offering deeper insights into both machine learning models and the human mind. As reasoning models continue to evolve, the line between artificial and natural intelligence may become increasingly blurred, opening up new possibilities and questions in the realm of cognitive science and artificial intelligence.
This article is inspired by content from Original Source. It has been rephrased for originality. Images are credited to the original source.
