Chipmakers Nvidia and Groq entered a non-exclusive technology licensing agreement last week to accelerate and reduce the cost of running pre-trained large language models using Groq’s language processing unit chips.
Groq’s language processing unit chips power real-time chatbot queries during the inference stage of AI operations, distinct from the model training process. These chips enable AI models to generate responses rapidly in applications such as chatbots.
Nvidia’s chips currently handle much of the AI training phase across the industry. Inference represents a bottleneck that Nvidia does not fully control. Groq’s chips target this inference stage specifically, where AI models apply knowledge gained from training to produce results on new data.
Groq designs its chips for inference to move AI models from laboratory experimentation into practical deployment. Inference occurs after training, when models process unseen inputs to deliver outputs in real-world scenarios.
Investors direct funds toward inference startups to connect AI research with large-scale everyday applications. Axios reporter Chris Metinko covered this investment trend earlier this year.
Enhanced inference capabilities allow companies to pursue additional enterprise AI projects at larger scales. These initiatives increase demand for training processes, which in turn elevates the need for Nvidia’s training chips.
AI models function through two phases: training and inference. During training, models process extensive datasets including text, images, and video to construct internal representations of knowledge.
In the inference phase, models identify patterns within previously unseen data and produce responses to specific prompts based on those patterns. This process resembles a student who studies material for an examination and then applies that knowledge during the test.
Groq originated in 2016 under the founding leadership of Jonathan Ross. The company bears no relation to Elon Musk’s xAI chatbot named Grok.
Jonathan Ross, Groq president Sunny Madra, and select other employees plan to join Nvidia, as stated on Groq’s website. Groq intends to maintain independent operations following these transitions.
The agreement constitutes a “non-exclusive inference technology licensing agreement.” This arrangement resembles an acquisition or acquihire. Stacy Rasgon described the structure in a note to clients as maintaining the fiction of competition, according to CNBC.
Companies employ such deal structures to navigate antitrust reviews while securing specialized AI personnel.
- Microsoft example: Recruited Mustafa Suleyman, co-founder of DeepMind.
- Google example: Re-engaged Noam Shazeer, co-inventor of the Transformer architecture central to GPT models.
Jonathan Ross, now moving to Nvidia, previously developed Google’s Tensor Processing Unit, known as the TPU. Deployment costs determine the extent to which companies utilize models developed through prior training efforts.





