OpenAI's Reasoning Model Disproves an 80-Year-Old Geometry Conjecture
TL;DR
OpenAI's general-purpose reasoning model disproved the Erdős unit distance conjecture — a problem open for 78 years — with no task-specific training. The proof was verified by Fields Medalist Tim Gowers and Princeton's Noga Alon.
In 1946, Hungarian mathematician Paul Erdős posed what sounds like a deceptively simple question: among n points placed in a plane, what is the maximum number of pairs at exactly distance 1 from each other?
Mathematicians called it the planar unit distance problem. For nearly eight decades, the consensus was that square grid arrangements were optimal. No one could construct anything better.
On May 20, 2026, OpenAI announced that one of its general-purpose reasoning models had disproved that assumption. The model identified an infinite family of planar configurations that produce more unit-distance pairs than any square grid — a polynomial improvement over the previous best known bound.
The result was verified by three prominent external mathematicians: Fields Medalist Tim Gowers of Cambridge, Will Sawin and Noga Alon of Princeton, who co-authored a companion paper. Thomas Bloom, who maintains the Erdős Problems website and had previously criticized OpenAI for a false claim about this very problem family, also confirmed the result.
The Model Was Not Trained to Do This
That distinction matters. OpenAI used a general-purpose reasoning model. No custom search tools were built, no task-specific fine-tuning was applied. Engineers gave the model a set of Erdős problems and let it run.
The contrast with seven months ago is sharp. Former OpenAI VP Kevin Weil publicly claimed on X that GPT-5 had solved ten Erdős problems. The math community quickly showed that the supposed solutions were either pre-existing, incomplete, or simply wrong. This time, OpenAI had external mathematicians check the proof before any public announcement.
What This Actually Demonstrates
The Erdős unit distance problem is not a problem with direct industrial applications. It sits at the heart of discrete geometry, useful as a benchmark for understanding geometric combinatorics. The mathematical community is interpreting this result carefully, not breathlessly.
The more significant claim is about general reasoning capability: if a model can sustain long reasoning chains, draw connections across mathematical fields, and produce correct results that survive scrutiny from top mathematicians — all without task-specific preparation — then the ceiling on what these models can do in science is genuinely unclear.
OpenAI cited potential implications for biology, physics, engineering, and medicine. That framing is easy to make. The Erdős result gives it a concrete anchor.
Erdős Died in 1996, Still Waiting
Erdős was famous for offering cash prizes to motivate solutions, ranging from tens to thousands of dollars depending on difficulty. He posed more than a thousand open problems before his death in 1996. The unit distance problem remained open.
A general-purpose model that was simply asked to try has now closed it.
That does not mean AI is smarter than humans at mathematics. What it does suggest is that the old division of labor — machines search, humans reason — may no longer hold in certain domains. The question is how quickly that line will keep moving.
If this was useful, subscribe to the newsletter for weekly AI PM insights and GenAI case studies.
Sources:
Related Articles
Noam Shazeer Leaves Google for OpenAI: Transformer Co-Author Defects Ahead of IPO
Noam Shazeer, co-author of the foundational transformer paper and Google Gemini co-lead, announced he is joining OpenAI. Google spent $2.7B to bring him back from Character.AI just two years ago. His departure is a significant blow to Gemini ahead of OpenAI's September IPO.
GPT-5.6 Sol Launches Under Government Lock: Washington's New Frontier AI Gate
OpenAI's GPT-5.6 Sol launched June 26, restricted to ~20 government-vetted partners only. Sol Ultra scores 91.9% on Terminal-Bench 2.1, but the governance framework matters more than the benchmark.