Google's new research paper on TPU v4 and its performance w.r.t. Nvidia's H100 & A100
Thursday, 6th April - 4:00 AM (Indian Standard Time)
I wrote a lot about the book Chip Wars and how it related to the present discussion between ChatGPT, BARD, and Baidu's Earnie in previous posts. And how Nvidia is winning the game by providing the crucial hardware to several AI players. According to a report published by Google yesterday (see https://arxiv.org/abs/2304.01433), their TPU 4.0 is superior to Nvidia's A100 for LLMs and DNNs in general. It implies -
TPU v4 outperforms TPU v3 by 2.1x and improves performance/Watt by 2.7x. The TPU v4 supercomputer is 4x larger at 4096 chips and thus ~10x faster overall, which along with OCS flexibility helps large language models. For similar-sized systems, it is ~4.3x–4.5x faster than the Graphcore IPU Bow is 1.2x–1.7x faster, and uses 1.3x–1.9x less power than the Nvidia A100.
In Optical circuit switching (OCS) also known as Google Palomar OCS is based on 3D Micro-Electro-Mechanical Systems (MEMS) mirrors that switch in milliseconds.
The paper mainly answers the following questions -
Do peak FLOPS/second predict real performance?
How does OCS differ from NVLink and NVSwitch?
What if TPU v4 used IB versus OCS?
Nvidia announced the H100, the successor to A100, in 2022. Why not compare TPU v4 to it?
Why 30%–90% more power for Nvidia's A100?
What is the CO2e from TPU v4 vs other DSAs?
How fast do ML workloads change?
Do individual DNN models also change?
Is MLPerf’s DLRM benchmark realistic?
How can DSAs avoid overspecialization?
For a summarized version, you can check out CNBC's coverage of it by Deirdei Bosa -
How to unsubscribe my substack - https://support.substack.com/hc/en-us/articles/360059788812-How-do-I-unsubscribe-from-a-free-subscription-