Problem: AI inference demand is rising, and so prices and energy demands are growing along with it. AI reference has been built on top of GPU’s which are graphics processing units originally built to render high-quality computer graphics. This is how Nvidia got its start, building GPUs primarily targeting gamers. Therefore, these chips were not optimized for AI inference from the ground up, and there is potential to build hardware that is much more efficient for training and running AI models.
Solution: Groq is building AI inference chips to replace GPUs. The chips they are creating are called LPUs or language processing units. These LPUs are different in a few ways. First, they have an “assembly line” structure that enables the LPUs to process data sequentially, different from the GPUs' parallel processing, and sequential processing is much more optimized for AI training. The LPUs are also built from a software-first principle, allowing large clusters of LPUs to work together with much less pain points. More pain points are relieved by the LPU’s on-chip memory, allowing easy data flow and transfer, and deterministic assembly line timing, making the chip’s processing more efficient. Overall, Groq is building these chips to handle specific AI inference computations more efficiently, which are primarily matrix multiplication functions. They are also building these chips with the new scale of data clusters in mind and taking away a lot of the friction of making millions of chips work together.
Founders: Jonathan Ross founded and is the active CEO of Groq. Previously to founding Groq, Ross started Google’s Tensor Processing Unit (TPU) team as a 20% project. Groq last raised $640 million at a $2.8 billion valuation in August 2024. This was their series D funding round. In total, they have raised over $1 billion.
Implications: AI companies are looking for ways to lower inference cost and complexity. xAI’s LLM Grok 4 (not related to Groq) is currently the leading model, and it was built on the Colossus data center, which consists of 200,000 GPUs. The reason xAI is ahead of the competition and the envy of many competitors is that they figured out how to connect all of these GPUs. The larger the cluster gets, the more complicated it becomes, and GPUs are not as conducive to these massive data centers. However, LPUs can solve this problem with their assembly line architecture and simplified software. This means that AI companies might be able to better scale their data centers with this new type of chip. If there is one thing, it is that if these chips are truly better, AI companies will pay the price. Leverage and spending are last on the checklist for companies building the leading LLM, which is proven by Nvidia being the most valuable company in the world now. If LPUs turn out to be more efficient than GPUs, the biggest question is what happens to all the GPUs and the capital that was invested in them. To keep up, companies will have to leverage themselves even more than they already are. The old GPUs would most likely be sold off to other countries. This would be beneficial for two reasons. One, American AI companies will receive some of the initial capital they invested into GPUs back, which will heighten their risk tolerance to new technology. Two, it would create a dependency on the old GPU technology in other countries. Foreign AI companies will use the leftover GPUs from Nvidia, which are the best on the market but still behind the new LPUs, in effect starving foreign hardware companies and also keeping foreign AI companies a step behind.
Conclusion: Groq is building very exciting new technology. Hardware is the root of many AI bottlenecks, like energy and cluster size. It makes sense that there would be improvement opportunities in this hahardwarePUs were not created with the specific AI inference use case in mind. To be sure, new technology like LPUs and rival technologies are something to look out for and great investment opportunities.