Google I/O 2026: TPU 8i and 8t Signal a New Era for AI Infrastructure

May 22, 2026
AI & Machine Learning
World
Akash Sharma

Share the Post:

Google confirmed at I/O 2026 on May 19 that it expects to spend approximately $180 to $190 billion in capital expenditure this year, roughly six times the $31 billion it spent in 2022. The centrepiece of that investment is a dual-chip eighth-generation TPU architecture that Google described as the most significant infrastructure milestone since its first commercial TPU in 2016. For the first time, Google has split its TPU architecture into two purpose-built chips: TPU 8t for large-scale model training and TPU 8i for inference workloads. The training chip scales across more than one million TPUs globally through JAX and Pathways, forming what Google calls the largest training cluster in the world.

The inference chip triples on-chip SRAM to 384MB, increases high-bandwidth memory to 288GB, doubles ICI bandwidth to 19.2 terabits per second, and delivers 1,500 tokens per second on production inference workloads — 80% better performance per dollar than the prior generation.

What the Dual-Chip Architecture Signals

The decision to split training and inference into separate silicon is the most consequential infrastructure architecture decision Google has made since it first built TPUs to replace GPUs for internal workloads. Training and inference have fundamentally different requirements. Training requires raw compute throughput and high inter-chip bandwidth across massive synchronised clusters. Inference requires low latency, high memory bandwidth for KV cache serving, and cost-efficient token throughput at scale. A single chip optimised for both produces compromises on each. Two purpose-built chips produce the best possible performance on each at the cost of managing two hardware ecosystems. Google’s willingness to absorb that operational complexity signals its confidence that the inference workload is large enough, permanent enough, and economically distinct enough from training to justify dedicated silicon.

The Anthropic-AWS $100 billion compute deal documented that the frontier AI lab compute market is consolidating around decade-long infrastructure commitments. The TPU 8i is Google’s answer to the question of what infrastructure those commitments will run on.

What This Means for Nvidia and the Infrastructure Market

The TPU 8i directly addresses the inference market where Google-Blackstone’s new TPU cloud venture competes with Nvidia-backed neoclouds. 80% better performance per dollar for inference is a commercially significant claim at a moment when enterprise AI budgets are 85% inference spend. If the TPU 8i delivers on that claim at production scale, it validates the Google-Blackstone joint venture’s commercial case and provides the competitive infrastructure cost advantage that makes the venture viable against Nvidia’s GPU ecosystem. The infrastructure market may now have a second credible roadmap to evaluate alongside Nvidia’s, with $180 to $190 billion in annual Google capex ensuring that roadmap will be executed regardless of supply chain constraints or competitive pressure.