AI Outsmarts Mathematical Elite in Private Berkeley Meeting
In mid-May, a secretive gathering took place in Berkeley, California, where 30 of the world’s top mathematicians assembled to challenge a cutting-edge artificial intelligence (AI) model. The event, held over a weekend, was designed to test the AI’s reasoning capabilities using complex, unsolved mathematical problems. To their amazement, the researchers found themselves outclassed by a reasoning chatbot named o4-mini.
Developed by OpenAI, o4-mini represents a new generation of reasoning large language models (LLMs). Unlike traditional LLMs, which are trained to predict text sequences, o4-mini is optimized for deep deduction and problem-solving. This enhanced ability allows it to delve into levels of mathematical complexity previously thought to be beyond the reach of AI.
Testing AI’s Mathematical Limits
To benchmark o4-mini’s capabilities, OpenAI enlisted the nonprofit Epoch AI to create a series of 300 unpublished math problems. These problems were divided into four tiers, ranging from undergraduate-level to challenges that would stump even expert mathematicians. In contrast to earlier models, which struggled to solve even 2% of the toughest problems, o4-mini managed to crack nearly 20% by April 2025.
To further push the AI’s boundaries, Epoch AI organized a highly confidential in-person event in Berkeley on May 17 and 18. The mathematicians, required to sign nondisclosure agreements and communicate only through encrypted channels like Signal, worked in teams to devise problems that could potentially stump the AI. Each unsolved problem would earn its creator $7,500.
AI Demonstrates Astonishing Reasoning
The participants were stunned as o4-mini demonstrated capabilities that mimicked those of a top-tier human mathematician. University of Virginia mathematician Ken Ono recounted one particularly humbling moment when the AI solved a Ph.D.-level number theory problem he had devised. Within minutes, the bot reasoned through the literature, proposed a simplified version of the problem, and then delivered a correct—and cheeky—solution.
“It shocked me,” Ono said. “At the end, it added, ‘No citation necessary because the mystery number was computed by me!’ That’s not just intelligence; that’s style.”
Ono wasn’t alone in his astonishment. Yang Hui He, a mathematician at the London Institute for Mathematical Sciences, noted that the bot was functioning at the level of a highly capable graduate student—if not better. “It does in minutes what takes us weeks or months,” He said.
A Collaborative Yet Alarming Future
Despite the AI’s prowess, the mathematicians ultimately succeeded in formulating 10 problems that o4-mini could not solve. Yet the overwhelming sense was one of unease, not triumph. The demonstration made clear that AI is rapidly approaching, and in some cases surpassing, the reasoning abilities of highly trained human experts.
Ono and He expressed concerns about over-reliance on AI-generated proofs. “There’s proof by induction, proof by contradiction, and now, proof by intimidation,” He quipped. “If something is stated with enough authority, people might stop questioning it.”
Preparing for the Next Tier
The conversation among the mathematicians eventually turned toward the future—specifically, the possibility of a “tier five.” These would be problems so complex that even the best human minds cannot solve them. If AI reaches that level, it could radically redefine the role of mathematicians. Instead of solving problems, mathematicians might shift toward curating and posing questions to AI models, akin to how professors guide graduate students.
Ono believes that educational institutions must adapt. “We need to foster creativity in students,” he said. “It’s a mistake to dismiss the potential of generalized AI. These models are already outperforming most of the world’s best graduate students.”
The event in Berkeley served as both a celebration of human ingenuity and a sobering look at the future. As AI continues to evolve, it brings new opportunities—and new challenges—for the fields of mathematics and beyond.
This article is inspired by content from Original Source. It has been rephrased for originality. Images are credited to the original source.
