The ~$700 Billion AI Infrastructure Bet Has a 5% Utilisation Problem Nobody Wants to Talk About

May 21, 2026
AI & Machine Learning
World
Akash Sharma

Share the Post:

The numbers that dominate the AI infrastructure conversation are the big ones. Amazon’s $200 billion annual capex plan. Microsoft’s $190 billion. Alphabet’s $175 to $190 billion. Meta’s $115 to $145 billion. The combined ~$700 billion that four hyperscalers have committed to spending on AI infrastructure in 2026 — nearly double what they spent in 2025. Those numbers are real and extraordinary and have generated a secondary industry of analysis about what they mean for the AI buildout, the GPU supply chain, and the long-term trajectory of the AI economy. There is a different number that gets far less coverage, partly because it comes from the other end of the demand signal and partly because it makes an uncomfortable argument about the gap between AI infrastructure investment and AI infrastructure utilisation. That number is 5%.

Cast AI’s 2026 State of Kubernetes Optimization Report, which analysed approximately 23,000 production clusters running on AWS, Azure, and Google Cloud, found average GPU utilisation of just 5% across enterprise Kubernetes environments. CPU utilisation averaged 8%. Memory utilisation averaged 20%. In any other capital-intensive industry, a 5% asset utilisation rate would trigger an immediate executive crisis. In enterprise AI infrastructure in 2026, it is the normal operational state that the industry has learned to tolerate, because the narrative of GPU scarcity made low utilisation seem like a reasonable precaution rather than a management failure. VentureBeat’s Q1 2026 AI Infrastructure and Compute Market Tracker confirmed that the panic phase has officially broken — enterprises secured capacity reservations that sat idle while internal teams struggled with data readiness, governance, and architectural immaturity.

The FOMO Loop That Created the Problem

The mechanism that produced 5% GPU utilisation across enterprise environments is not mysterious. It is the product of a specific procurement dynamic that VentureBeat researchers have called the FOMO loop. During the 2023-2024 GPU shortage, enterprises that wanted GPU capacity faced genuine scarcity. Reserved capacity was difficult to secure and analysts warned that AI competitors unable to access compute would fall behind. Enterprises responded rationally by securing every GPU reservation they could access, on the assumption that scarcity would persist and that having more capacity than needed was safer than having less.

The problem is that the workloads never caught up with the reservations. Organisations were activity-rich in buying chips but output-poor in generating useful tokens. At 5% utilisation, 95 cents of every dollar spent on silicon is effectively a donation to a cloud provider’s bottom line. In any other department, a 95% waste metric would be a firing offense. The enterprises still running 2024 GPU reservations at 5% utilisation in 2026 are not getting value proportional to what those investments cost. They are paying reservation fees against AI programmes still in pilot phase, carrying the cost of idle capacity that has never seen a production workload.

The Orchestration Maturity Gap That Keeps Utilisation Low

The root cause of 5% GPU utilisation is not insufficient AI use cases or weak model capability. IDC’s Dave McCarthy described it as a systems problem, not a hardware problem — driven by overprovisioning, weak scheduling, and data pipeline bottlenecks that keep GPUs idle even when workload demand exists. The GPU must be allocated for the entire lifecycle of a job even if compute is only needed for a fraction of it. A training pipeline where CPU-heavy preprocessing runs on GPU-allocated instances leaves the accelerator idle during that preprocessing phase, which can represent 40 to 60% of total pipeline runtime. An inference deployment serving intermittent queries on a dedicated GPU instance leaves hardware idle between requests, without the batching and scheduling infrastructure needed to consolidate traffic and saturate the chip.

Organisations that have implemented properly disaggregated runtime architectures — separating CPU and GPU stages with precise resource allocation — have reported 50 to 70% improvements in GPU utilisation on existing hardware. That is not a marginal gain. Moving from 5% to 30% utilisation on the same fleet delivers the equivalent of six times more AI workload capacity per dollar spent. No hardware purchase achieves a six-times capacity multiplier. For most enterprises right now, the infrastructure investment with the highest near-term return is not more GPUs. It is the orchestration, scheduling, and pipeline architecture work that saturates the GPUs they already have.

The Infrastructure Implications for the Hyperscaler Buildout

The 5% figure creates a specific and underappreciated tension in the AI infrastructure investment thesis. The hyperscalers are spending ~$700 billion on AI infrastructure in 2026 on the premise that enterprise AI demand will grow to absorb that capacity. The capacity is being built for a demand signal that the utilisation data suggests is not yet converting efficiently into actual workload deployment. McKinsey reports that about 62% of organisations are experimenting with or piloting AI agents and fewer than 10% have scaled enterprise-wide. Enterprises still in pilot phase are not saturating infrastructure. Those that have scaled are running workloads at 5% utilisation because their orchestration architecture was not designed for production volume.

The hyperscaler buildout is not wrong to anticipate the demand that will eventually materialise as enterprise AI moves from pilot to production. But the utilisation data is a caution signal about the timeline. If enterprise orchestration maturity is the binding constraint on utilisation, and that maturity takes two to three years to develop at scale across a large organisation’s AI programmes, the demand the 2026 infrastructure investment is sized to serve may not arrive until 2028 or 2029. That timeline could be longer than the financing structures and payback expectations embedded in current infrastructure commitments assume.

Whether AI infrastructure has a revenue problem documented that the utilisation gap is the most direct operational evidence that the revenue question deserves more attention than the market is currently giving it. The ~$700 billion bet is real. The 5% problem is equally real. And until the two are discussed in the same conversation, the infrastructure market is only seeing half the picture.

The infrastructure market has spent three years counting GPUs. It now needs to start counting what those GPUs are actually doing.