← Back to Insights

GPT-5.5-Cyber Is Live: OpenAI Used AI to Find 24 Linux Kernel Exploits

Nils Liu
OpenAI GPT-5.5-Cyber Daybreak Cybersecurity Patch the Planet AI Security Open Source

TL;DR

OpenAI launched GPT-5.5-Cyber on June 22. Daybreak already found 24 Linux kernel exploits, 5 Chrome V8 vulnerabilities, and 10 Safari flaws. The CyberGym score of 85.6% is the headline. The ExploitGym score of 39.5% is why access is restricted to vetted defenders only.

GPT-5.5-Cyber Is Live: OpenAI Used AI to Find 24 Linux Kernel Exploits

This article covers the defensive framing almost entirely — because that is what OpenAI published. The offensive implication of a 39.5% ExploitGym score sits underneath every paragraph. If you work in threat intelligence and you have seen early signals of AI-assisted exploit development from nation-state groups, I want to know how the timeline compares to what you were seeing in 2024. The five-agency warning is a policy document; real-world attacker timelines are the ground truth.


On June 22, OpenAI shipped the full version of GPT-5.5-Cyber, its most capable defensive cybersecurity model to date, alongside a disclosure of what the Daybreak research program has been doing for the past six months.

The results list is specific: 24 Linux kernel local privilege escalation exploits, 8 kernel pointer information leak proofs-of-concept, 34 FreeBSD vulnerabilities, 6 dnsmasq flaws, 5 exploitable Chrome V8 vulnerabilities, 10 exploitable Apple Safari flaws. All were reported to the respective maintainers through responsible disclosure before the announcement.

Alongside the model release, OpenAI launched Patch the Planet, a collaboration with Trail of Bits and HackerOne to bring AI-assisted vulnerability scanning and patching to major open source projects: cURL, Go, Python, Sigstore, aiohttp, and others.

What Daybreak Actually Does

GPT-5.5-Cyber is built for automated vulnerability detection, attack path tracing, and patch generation within integrated security workflows. The updated Codex Security plugin supports end-to-end workflows from SARIF exports to CodeQL integration, with codebase-specific patch generation and severity-rated reporting.

Access is tightly restricted. GPT-5.5-Cyber is available only to organizations that have cleared OpenAI’s Daybreak vetting process. Confirmed government partners include Australia, Canada, France, Germany, Japan, South Korea, and ENISA. Commercial access runs through a network of 25-plus certified Daybreak partners.

Since the March preview launch, Codex Security has scanned more than 30 million commits across 30,000-plus codebases. OpenAI’s Patch the Planet announcement lists two output figures: 500,000 automatically resolved security findings, and 70,000 manually verified fixes.

What the Numbers Actually Mean

CyberGym is OpenAI’s own benchmark, not an independently verified third-party evaluation. That context matters when reading the headline score. GPT-5.5-Cyber at 85.6% versus Anthropic Mythos 5 at 83.8% is a 1.8-percentage-point gap. Before that gap is replicated by independent evaluators, the right reading is “roughly equivalent capability” rather than “clear lead.”

The ExploitGym number deserves more attention than the CyberGym number. ExploitGym measures how often the model generates a working exploit against a known-vulnerable target. GPT-5.5-Cyber at 39.5% versus base GPT-5.5 at 25.95% means: nearly 4 out of 10 attempts against a verified-vulnerable target produce functional attack code. That is the number that prompted five intelligence agencies to jointly warn that frontier AI models are “significantly compressing the timeline from vulnerability disclosure to exploitation.” The restricted access policy is a direct response to ExploitGym 39.5%, not just defensive optics.

The 500,000 auto-resolved versus 70,000 manually verified ratio implies roughly a 14% acceptance rate on AI-generated patches. The interpretation: AI dramatically accelerates vulnerability discovery and triage, but about 86% of automatically generated patch suggestions do not pass human review. The last mile of patching still requires human judgment at scale.

Scanning cost math: 30 million commits, averaging 1,000 tokens per diff, totals 30 billion input tokens. At GPT-5.5 inference pricing, the raw scanning cost runs between $100,000 and $200,000. That economic reality is why Patch the Planet operates as a centralized OpenAI-run program rather than a toolkit any engineering team can run themselves.

The traditional bug bounty market sets individual critical vulnerability payouts between $30,000 and $150,000. Five exploitable Chrome V8 vulnerabilities and ten exploitable Safari flaws, using conservative figures, total over $500,000 in equivalent bounty value. Found by AI in a systematic sweep. What that does to the economics of vulnerability research programs at Google and Apple is an open question for the second half of 2026.

What to Watch Over the Next Three to Six Months

Patch the Planet covers cURL, Go, and Python, three projects with combined daily download counts exceeding one billion. If by end of Q3 2026 none of those projects has published a CVE advisory explicitly crediting GPT-5.5-Cyber or Daybreak as the discovery mechanism, the program is moving more cautiously than the launch announcement suggested, or the attribution language is still under legal negotiation.

The Daybreak government partner list currently includes no non-G7 Asia-Pacific nations and no Middle Eastern security agencies. The first time a country outside that group joins, and how quickly the list expands over 2026, is a concrete indicator of how OpenAI is positioning itself in the geopolitical security landscape.

Google’s and Apple’s patch turnaround times for the Daybreak-disclosed vulnerabilities are the third trackable data point. Both companies publish CVE advisories with acknowledgment language. If their response timelines are longer than their historical averages for similar severity findings, that suggests AI-discovered vulnerability rates are outpacing the absorption capacity of traditional patching pipelines.

If this was useful, subscribe to the newsletter for weekly AI PM insights and GenAI case studies.


Related Reading

Get the latest insights

Join the newsletter to receive my latest articles on GenAI, AI Agents, and architecture.

No spam. Unsubscribe anytime.