← Back to Insights

OpenAI's First Custom Chip Jalapeño Revealed: Built in Nine Months to Cut Inference Costs

Nils Liu
OpenAI Jalapeño Broadcom AI Chip Inference Custom Silicon NVIDIA

TL;DR

OpenAI and Broadcom unveiled Jalapeño, OpenAI's first custom inference ASIC, designed to tape-out in nine months with AI-assisted development. Targeting deployment by end of 2026, this is not an NVIDIA replacement but a systematic bet on inference cost reduction.

OpenAI's First Custom Chip Jalapeño Revealed: Built in Nine Months to Cut Inference Costs

Real question for anyone running inference at scale: what does your cost breakdown actually look like between training and inference? OpenAI’s move into custom silicon only makes economic sense if their inference bill exceeds 40% of total compute spend. If your numbers suggest a different tipping point, that changes the calculus here. Drop the figure in the comments.


On June 24, OpenAI and Broadcom jointly announced Jalapeño, OpenAI’s first custom-designed AI chip. It is an inference accelerator, not a training GPU replacement.

Engineering samples were physically delivered to OpenAI CEO Sam Altman and President Greg Brockman by Broadcom CEO Hock Tan and President Charlie Kawwas. The samples are already running GPT-5.3-Codex-Spark inference workloads at target frequency and power. Production deployment is planned for end of 2026, with this chip positioned as the first step in a multi-generation compute platform.

What Jalapeño Actually Does

According to the joint announcement on OpenAI’s blog and the Broadcom investor relations page, Jalapeño is a purpose-built ASIC for LLM inference. The architecture targets specific bottlenecks that matter at OpenAI’s scale: costly data movement, compute-to-memory bandwidth balance, and networking efficiency.

Greg Brockman said: “We have a deep understanding of the workload. We’ve really been looking for specific workloads that are underserved.”

The collaboration divided along capability lines: OpenAI handled architecture design and AI-assisted optimization, Broadcom took silicon implementation and networking interconnects, Celestica covers board, rack, and system integration.

The Numbers Behind the Headlines

Nine-Month Development: More Complicated Than It Looks

A nine-month timeline from design to tape-out is genuinely unusual for high-performance ASICs; standard timelines run 24 to 36 months. Three factors likely converged: Broadcom reused substantial existing IP from prior custom silicon programs; OpenAI’s own models accelerated EDA optimization flows; and the architecture was deliberately kept simpler than a general-purpose accelerator to hit the timeline.

All three can be simultaneously true. Most coverage fixated on “AI designed its own silicon,” a cleaner narrative than “Broadcom had the right IP library ready.”

Performance Claims Still Lack Independent Verification

“Significantly better performance-per-watt than current state-of-the-art alternatives” comes from OpenAI’s own testing. No independent benchmark has verified this. The chip is not in production yet. The comparison baseline, whether H100, H200, or B200, is not specified in any public release.

A useful Fermi frame: if OpenAI’s inference-specific compute spend sits around $3-4B annually (conservative given ChatGPT’s scale), a 15-20% efficiency improvement translates to $450M-800M in annual cost reduction at full deployment. That is a large enough number to justify multi-year custom silicon investment.

Broadcom’s own forecast puts Q3 AI chip revenue at $16B, with a $100B long-range target. A dedicated OpenAI chip as anchor customer improves revenue visibility substantially.

Why Inference and Not Training

OpenAI was explicit: pre-training and other intensive workloads continue on NVIDIA hardware. That is rational specialization, not retreat. LLM training requires architectural programmability that changes with every model generation. Inference at ChatGPT’s scale is a more stable, predictable workload that rewards custom optimization.

Indicators Worth Watching Over the Next Six Months

Three verifiable outcomes to track:

ChatGPT API pricing. If Jalapeño deploys on schedule by end of 2026, OpenAI’s API cost per million tokens should show measurable movement downward in early 2027. OpenAI has historically cut prices within a few quarters of infrastructure cost reductions. This serves as an external validation of efficiency claims.

Broadcom’s AI revenue concentration. If Q3 and Q4 earnings calls show increased visibility from a strategic large customer, the Jalapeño deployment timeline has external corroboration.

Architecture drift risk. OpenAI ships new model architectures every six months. Jalapeño’s design assumptions are frozen at today’s inference patterns. If 2027’s inference architecture diverges significantly from today’s (more speculative decoding, different attention mechanisms, different KV cache strategies), that is the real stress test for an Apple-style full-stack strategy applied to AI.


If this was useful, subscribe to the newsletter for weekly AI PM insights and GenAI case studies.


Further reading:

Get the latest insights

Join the newsletter to receive my latest articles on GenAI, AI Agents, and architecture.

No spam. Unsubscribe anytime.