Skip to content

Google's New Supercomputers for AI Training

Google's New Supercomputers for AI Training

Google has revealed details about its supercomputers used to train its artificial intelligence (AI) models. The company claims that its custom-designed Tensor Processing Unit (TPU) chips are faster and more power-efficient than comparable systems from rival Nvidia.

According to Google, it uses its TPUs for over 90% of its AI training work, a process that involves feeding data through models to make them useful at tasks such as generating images and responding to queries with human-like text. The company has already released the fourth generation of TPUs, which use custom-developed optical switches to connect more than 4,000 of the chips together into a supercomputer.

With the rise of large language models that power technologies like Google's Bard or OpenAI's ChatGPT, which have exploded in size, companies building AI supercomputers are in fierce competition to improve interconnects. These models must be split across thousands of chips, which must then work together for weeks or more to train the model.

Google's largest publicly disclosed language model, PaLM, was trained by splitting it across two of the 4,000-chip supercomputers over 50 days. Google's supercomputers make it easy to reconfigure connections between chips on the fly, helping avoid problems and tweak for performance gains.

The company claims that its chips are up to 1.7 times faster and 1.9 times more power-efficient than Nvidia's comparable systems. Google has hinted that it is working on a new TPU that would compete with Nvidia's latest flagship H100 chip, but has provided no details. The company said it has a healthy pipeline of future chips.

In conclusion, Google's supercomputers for AI training are a significant step towards improving AI models and making them more efficient. With its custom-designed TPU chips, Google is in a good position to continue leading the way in AI research and development.