
OpenAI’s latest model demonstrated an unexpected capability in solving high-level mathematical problems, according to testing conducted by software engineer and former quant researcher Neel Somani.
Somani observed the model generate a full solution after 15 minutes of processing a problem in ChatGPT, subsequently formalizing the proof with the Harmonic tool, confirming its accuracy. He stated he aimed to establish a baseline for large language models’ (LLMs) capacity to solve open mathematical problems.
The model’s chain of thought invoked mathematical axioms including Legendre’s formula, Bertrand’s postulate, and the Star of David theorem. It located a 2013 Math Overflow post by Harvard mathematician Noam Elkies, which offered a similar problem’s solution, but ChatGPT’s final proof differed and provided a more complete solution to a version of a problem posed by mathematician Paul Erdős.
Since the release of GPT 5.2, which Somani described as “anecdotally more skilled at mathematical reasoning than previous iterations,” a growing volume of solved problems has raised inquiries about LLMs’ ability to advance human knowledge. Somani focused on the Erdős problems, a collection of over 1,000 conjectures maintained online, which vary in subject matter and difficulty.
The first autonomous solutions to these problems emerged in November from AlphaEvolve, a Gemini-powered model. More recently, Somani and others have found GPT 5.2 adept with high-level mathematics. Since December, 15 problems on the Erdős website have shifted from “open” to “solved,” with 11 solutions crediting AI models.
Mathematician Terence Tao, on his GitHub page, noted eight problems where AI models made meaningful autonomous progress and six cases where progress involved locating and building on prior research. Tao conjectured on Mastodon that AI systems’ scalable nature makes them “better suited for being systematically applied to the ‘long tail’ of obscure Erdős problems, many of which actually have straightforward solutions,” adding that “many of these easier Erdős problems are now more likely to be solved by purely AI-based methods than by human or hybrid means.”
A driving force in this advancement is a shift towards formalization, a labor-intensive process for verifying and extending mathematical reasoning. While not requiring AI, new automated tools have simplified this process. The open-source proof assistant Lean, developed at Microsoft Research in 2013, has gained wide use for formalizing proofs, and AI tools like Harmonic’s Aristotle aim to automate much of this work.
Tudor Achim, Harmonic’s founder, stated the engagement of mathematicians and computer science professors with AI tools held more significance than the number of solved Erdős problems. Achim said, “These people have reputations to protect, so when they’re saying they use Aristotle or they use ChatGPT, that’s real evidence.”