GLM-5.2 Beats GPT-5.5: China Open-Weight Model Claims SWE-bench Lead at 1/6th Cost

Z.ai (formerly Zhipu AI) went public with the full weights and API for GLM-5.2 on June 17. The 753B-parameter open-weight model scores 62.1 on SWE-bench Pro, edging past GPT-5.5’s 58.6. The pricing spread is the bigger story: $4.40 per million output tokens versus GPT-5.5’s $30, roughly a sixfold difference. MIT licensing means any enterprise can download, fine-tune, and deploy commercially without signing a vendor agreement.

Why the Timing Matters

GLM-5.2 didn’t arrive in a vacuum. Anthropic’s Fable 5 and Mythos 5 have been offline since June 12 under a US Department of Commerce emergency directive, now entering day 10. The ban traces back to SK Telecom’s $100M Anthropic investment and an Amazon research team’s vulnerability disclosure.

Into that supply gap, GLM-5.2 lands with an Anthropic-compatible API endpoint. Developers currently using Claude Code or Cursor can theoretically swap a base URL and keep working. Weights are available on Hugging Face (zai-org/GLM-5.2) for teams with on-premise GPU capacity.

VentureBeat noted this marks the first time a Chinese open-weight model has taken a confirmed lead on long-horizon coding benchmarks. Six months ago that sentence wouldn’t have been written seriously.

What the Numbers Actually Say

The headline scores deserve a closer read.

SWE-bench Pro 62.1 vs 58.6 is a 3.5-point gap, about 6% relative improvement. FrontierSWE 74.4% vs 72.6% is a smaller margin, and Claude Opus 4.8 still sits at 75.1% on that same benchmark. On Terminal-Bench 2.1, GPT-5.5 actually wins: 84.0 vs GLM-5.2’s 81.0. This is a category-specific lead on long-horizon coding tasks, not a general sweep.

All benchmark numbers come from Z.ai’s own reporting. No independent third-party verification exists yet. The standard caveat applies: treat self-reported numbers as marketing until replicated.

The architecture has one genuinely interesting innovation: IndexShare. The mechanism reuses sparse attention indexers across transformer layers, cutting floating-point operations by roughly 2.9x at 1M-token context length. With 753B total parameters but only ~40B active per inference (MoE), the cost advantage has a concrete engineering explanation. It’s not scale magic.

A quick cost estimate on real workloads. A typical SWE-bench-style task burns 80K to 120K output tokens. At GPT-5.5 pricing that’s $2.40 to $3.60 per task. At GLM-5.2 API pricing, $0.35 to $0.53. At 1,000 agentic coding tasks per day, the monthly delta is roughly $57K to $93K. For teams running large-scale CI/CD agentic pipelines, that’s not noise.

Self-hosting carries a higher bar. Z.ai recommends a minimum of eight H100 GPUs. Cloud spot pricing runs $25 to $35/hour, roughly $220K annually just in compute before engineering overhead. The MIT license gives smaller teams the theoretical right to run it; the hardware cost is what actually gates access.

Signals Worth Watching

Three concrete data points will determine how this plays out.

First, when Fable 5 comes back. Anthropic’s Chris Ciauri said “coming days.” If the export restriction lifts before June 30, GLM-5.2’s substitution window is roughly two weeks. An extension into July would force enterprise procurement teams to make longer-term decisions about API diversification.

Second, whether the OpenRouter Fusion DRACO numbers hold under independent testing. Reports claim that a Gemini + Kimi + DeepSeek combination reaches 64.7% DRACO scores, approaching Fable 5 performance. If that replicates, the “single best model” moat is being eroded by multi-model synthesis. That’s structurally bad news for every closed-source lab.

Third, GLM-5.2 download velocity on Hugging Face through August. DeepSeek-V3 hit 1M downloads in its first week. GLM-5.2 reaching that scale would validate Z.ai’s pricing strategy in concrete adoption numbers, not just benchmark charts.

Have you built anything on an Anthropic-compatible endpoint that wasn’t actually Anthropic? What broke, if anything?

If this was useful, subscribe to the newsletter for weekly AI PM insights and GenAI case studies.

Related Reading

GLM-5.2 Beats GPT-5.5: China Open-Weight Model Claims SWE-bench Lead at 1/6th Cost

Why the Timing Matters

What the Numbers Actually Say

Signals Worth Watching

Related Articles

US AI Models Crash Below 30% on OpenRouter: China Dominates Developer Traffic as OpenAI Faces IPO Pricing Dilemma

GPT-5.6 Sol Launches Under Government Lock: Washington's New Frontier AI Gate

Why the Timing Matters

What the Numbers Actually Say

Signals Worth Watching

Related Articles

US AI Models Crash Below 30% on OpenRouter: China Dominates Developer Traffic as OpenAI Faces IPO Pricing Dilemma

GPT-5.6 Sol Launches Under Government Lock: Washington's New Frontier AI Gate

Get the latest insights