OpenAI and Broadcom Unveil Jalapeño, an AI Chip for Inference
📋 Today’s 3-Line Summary
- OpenAI has built a dedicated chip with Broadcom to handle ChatGPT requests.
- The goal is to reduce reliance on Nvidia GPUs and process more responses with the same amount of power.
- Builders should start breaking down model cost and latency at the product-feature level.
Hello, today we’re looking behind the “model race” at the AI chip race that actually shapes product speed and cost.
📌 Today’s Deep Dive — What OpenAI’s First Inference Chip Tells Us About the AI Cost War
What happened
OpenAI and Broadcom unveiled a new AI chip called
Jalapeño
OpenAI describes it as an “inference chip optimized for LLMs.” In plain terms, it is a server chip that handles the moment when ChatGPT or Codex receives a user’s question and generates an answer. It is not a chip for training a model from scratch. It is focused on continuously running already-built models for large numbers of users.
Jalapeño is an ASIC, or Application-Specific Integrated Circuit. That means it is not a general-purpose chip like a GPU that is good at many tasks, but a custom chip designed for a specific purpose. The Verge summarized the chip as being aimed at processing requests for services such as ChatGPT and Codex. OpenAI calls it “the first step in a multi-generation compute platform” and said it plans to deploy the chip in data centers by the end of 2026.

Image source: Ars Technica
Why now?
The core of this news is not that “OpenAI wants to become a chip company.” More precisely, it is a signal that
the bottleneck in AI services is moving from model performance to operating cost
For products with massive user bases like ChatGPT, inference costs accumulate every day more than training costs do. Every time a user asks a question, tokens are generated, and that consumes servers and power. So from OpenAI’s perspective, simply buying more Nvidia GPUs is not enough. GPU supply is constrained, and data center power is tight. Ars Technica interpreted OpenAI’s custom chip effort as a move to reduce dependence on external suppliers like Nvidia and optimize the model, product, and hardware as one integrated system.
Other big tech companies are already taking the same path. Google has TPUs, Amazon has Trainium and Inferentia, and Microsoft and Meta are also pushing their own AI chips. What makes OpenAI different is that it was a model and product company, not a cloud infrastructure company. Now the game is no longer just about “making a good model.” It is becoming a game where
the company that can run that model cheaply and reliably wins
The details: 9 months, power efficiency, and the numbers still missing
Among the numbers that have been disclosed, the most eye-catching is the development timeline. OpenAI and Broadcom said Jalapeño’s design and production preparation were completed in nine months. Broadcom said the chip was designed from the ground up for LLM inference, based on discussions with OpenAI researchers and information from OpenAI’s future model and product roadmap.
The performance figures should still be treated carefully. OpenAI said that in early testing, performance per watt was “significantly better” than the current state of the art, but it did not disclose specific benchmarks or comparison targets. The Decoder pointed this out as well. We still do not know which model, batch size, or latency conditions were used for comparison. A technical report is expected later.
Broadcom’s comments are also interesting. The Verge reported that Broadcom CEO Hock Tan said in a Reuters interview that Jalapeño delivers performance on par with Nvidia Blackwell and Google TPU. But this, too, is not an independently verified public benchmark. For now, it is best to treat it as a possibility.
The Decoder also reported the broader deployment picture. The structure is described as OpenAI handling chip design, Broadcom providing semiconductor and networking technology, and Celestica handling board, rack, and system integration. The report also said the first deployment is planned at gigawatt scale, including Microsoft and other partners, and that Microsoft is expected to guarantee the purchase of 40% of the initial supply. This part should be viewed as a report-based outlook rather than a confirmed announcement.
Why it matters
From a builder’s perspective, this may look like distant semiconductor news. But in practice, it is likely to come back as
API pricing, response speed, and usage limits
The cost of an AI app usually breaks in three places: input tokens, output tokens, and waiting time. As usage grows, the key question becomes less “how smart is the model?” and more “how cheaply can this request be handled, and within how many seconds?” That is also why OpenAI wants direct control over inference-dedicated chips. If it can handle more requests with the same amount of power, the company can preserve margins while making room for larger models or longer context.
The competitive landscape changes here too. Nvidia is still central to AI training and inference. But if hyperscalers and model companies build their own chips, GPUs become one option among several, not the only path. Custom chip suppliers like Broadcom become more important in that shift. Ars Technica also noted that Broadcom is expanding its custom chip business for large customers amid the AI boom.
In short: frontier model competition is no longer only a competition of papers and demos.
It is a vertical-integration race that extends to chips, power, networks, data centers, and API price sheets
Startups do not need to build chips. But product design needs to reflect this reality. Every model call is part of cost of revenue.
What to watch next
There are three things to watch from here.
First is the technical report OpenAI will release. Saying performance per watt is good is not enough. We need to see which model was used, under what latency target, and what token throughput was achieved.
Second is the actual deployment schedule. OpenAI has said it will deploy the chips in data centers by the end of 2026, but large-scale chips require production, packaging, rack integration, and power availability to all line up. If even one step slips, the point at which this shows up in product pricing or usage policies may be delayed.
Third is changes to API pricing and limits. If Jalapeño succeeds, OpenAI could lower inference costs for certain models or offer longer tasks more cheaply. Conversely, it may first be used to stabilize internal services. So rather than expecting immediate price cuts, it is better to focus on the fact that
the era where model providers control the infrastructure stack as well is arriving
⚡ Quick News
- Google DeepMind added computer-use capabilities to Gemini 3.5 Flash — The model is expanding toward seeing screens and performing actions such as clicking and typing. Source
- OpenAI announced its participation in advanced AI standards work — Through the Appia Foundation, it aims to create shared standards for evaluation frameworks and safety practices. Source
- OpenAI unveiled Daybreak, a security product suite — With Codex Security and GPT-5.5-Cyber, the direction is to help discover, verify, and patch vulnerabilities. Source
- Companies are moving to control AI budgets — As token costs pile up even for small tasks, team-level usage limits and cost management are becoming important, according to a report. Source
- New data challenges the belief that AI is reducing developer jobs — According to SignalFire data, engineers are actually taking a larger share of new hires. Source
- Kakao and Samsung SSAFY held an AI hackathon — Kakao Tech Bootcamp and Samsung Software & AI Academy for Youth collaborated around the theme of solving social problems. Source
❓ FAQ
Is OpenAI Jalapeño a chip that regular users can buy?
No. Based on what has been disclosed so far, Jalapeño is not a PC component for general sale. It is a data center AI inference chip. It is intended to be deployed in infrastructure used by OpenAI and its partners to operate services like ChatGPT and Codex at scale.
Will OpenAI API prices drop immediately when Jalapeño arrives?
No pricing changes have been confirmed yet. OpenAI has talked about data center deployment by the end of 2026, but it has not announced API price cuts or higher limits for specific models. We need to see the technical report and the price sheet after actual deployment.
🇰🇷 So What Should You Do Now?
For each product feature, extract input tokens, output tokens, average response time, and retry rate. Don’t look at it as “we use GPT-5.” Look at it as “one customer support summary costs this much.” When the impact of inference chips like Jalapeño shows up in price sheets, you will be able to quickly decide which features to switch and which to leave as they are.
- Rebuild your cost table by model this week.
Do not call OpenAI, Gemini, and Claude directly from scattered places in your codebase. Put a router in between, and move the model name, max tokens, cache setting, and fallback model on failure into configuration. In the future, if inference costs fall for a specific provider, you should be able to reduce costs just by changing settings, without a deployment.
- Separate your model-call layer.
The inference-chip race is likely to move toward lowering the unit cost of long-running tasks. It is worth separating classification, retrieval, drafting, and verification now. Later, when cheaper inference models arrive, changing only some stages will still lower your costs.
- For long tasks, try splitting them into “several cheap-model steps” instead of “one advanced-model call.”
Today’s one-liner: The battle in AI products is moving beyond model performance to how cheaply each answer can be produced.
—
AI Daily · A morning brief for people building with AI
Related Posts
Nvidia Has Nearly Eliminated Cooling-Water Use in AI Data Centers
📋 Today’s Story in Three Lines Nvidia has designed Rubin data centers to use 45°C warm-water cooling. The design can nearly eliminate on-site cooling-water use, but it addresses only part of the water footprint once power generation is included. Check cooling methods and water use by energy source separately in your infrastructure contracts. Today, we […]
The Chat-Based App Builder That Exited for ~$80M in 6 Months
🔥 Key Takeaways in This Deep Dive How an unstable environment was turned into product speed How usage validated the product more than presales The structure that moved free users into subscribers How the demo itself became an acquisition loop The operating style that reduced the creation flow instead of adding features Base44 sat inside […]
The Travel Blog That Started With $50 and Reached $28K in a Peak Month
🔥 What This Deep Dive Covers How a layoff led to a tightly defined reader problem How a $50 start limited the downside How search content became a revenue-producing asset How SEO and email worked together as an acquisition system How to interpret seasonal revenue swings The Wayward Home did not begin with a large […]
Get briefs like this in your inbox
Daily briefing for AI builders. Free, published Mon-Fri.