OpenAI revealed its inaugural custom-built inference processor, named Jalapeño, which was developed in collaboration with Broadcom. The processor caters specifically to the distinct requirements of OpenAI’s inference systems, with the company stating that its own AI models contributed to its development.
The chip is still undergoing testing, but early results indicate a significant improvement in performance-per-watt relative to current leading alternatives. OpenAI’s partnership with Broadcom was officially announced in October, and the creation of custom chips has been viewed as a strategy to reduce reliance on Nvidia’s graphics processing units.
Google and Amazon have created comparable custom chips, termed “AI accelerators,” to accelerate machine learning tasks. In an in-house podcast, OpenAI president Greg Brockman discussed the company’s chip development strategy after announcing the partnership with Broadcom. “We have a deep understanding of the workload,” Brockman said. “We’ve really been looking for specific workloads that are underserved, [and asking] how can we build something that will be able to accelerate what’s possible?”
Jalapeño is tailored for inference tasks, which involve the application of pre-built AI models based on user commands. According to OpenAI, the chip offers low operating costs when managing real-time coding models. However, performance-intensive processes such as pre-training may still require Nvidia hardware. Even minor reductions in inference costs could significantly enhance OpenAI’s profitability.
Optimizing the inference system is critical for the future economics of AI, and the company is expanding its capabilities across the entire technological stack. OpenAI is simultaneously developing products like Codex and the models that support them along with establishing data centers for model deployment. The shift to custom silicon is expected to further enhance these operational efficiencies.
OpenAI detailed that its strategy encompasses the design of infrastructure components, including chip architecture, kernels, memory systems, networking, scheduling, and deployment systems. This comprehensive approach allows for optimization across all layers of technology, aiming to deliver faster, more reliable, and cost-effective models for users.





