The Next Data Center You Build Has to Serve Two Fundamentally Different Workloads
The data center industry spent the past three years designing for one thing: AI training. The race to deploy GPU clusters for large model training defined the architecture, the power density, the network topology, and the cooling systems of the AI data centers that came online between 2023 and 2025. Training workloads connect tens of thousands of GPUs in tightly coupled clusters where every node must communicate with every other node at microsecond latency. The network is optimised for collective communication patterns. The rack density pushes toward the maximum that cooling systems can support. The facility is built around synchronisation as its primary design principle.
The inference era has arrived simultaneously with the training era rather than following it sequentially, and that simultaneity is creating a design challenge that the industry has not yet fully worked through. Ram Nagappan, vice president of AI infrastructure at Oracle Cloud Infrastructure, told Data Center World 2026 that operators must now design for two fundamentally different AI patterns: large-scale training and distributed inference. Training workloads connect tens of thousands of GPUs in tightly coupled clusters where latency and proximity matter. Inference workloads prioritise availability and responsiveness at a broader scale. Those differences cascade through the facility, affecting layout, resilience, and network design. The result is a more complex baseline: a single facility must support both tightly synchronised systems and distributed, user-facing workloads with fundamentally different performance requirements, fundamentally different failure tolerance profiles, and fundamentally different network architectures.Â
The Infrastructure Requirements That Pull in Opposite Directions
The design tension between training and inference is most visible at the network layer. Training clusters require the highest-bandwidth, lowest-latency interconnects available, because the all-to-all collective communication patterns of large model training are performance-limited by network throughput and latency in ways that are directly reflected in training speed and therefore model development cost. Varun Sakalkar, distinguished engineer in Google’s datacenter technology and systems group, noted at Data Center World 2026 that racks which once pushed 30 to 40 kilowatts are now measured in hundreds of kilowatts, with designs approaching the megawatt range, driven by the synchronisation requirements of tightly coupled training clusters.Â
Inference workloads generate a different network traffic pattern. Individual inference requests are short, concurrent, and latency-sensitive in a different dimension from training synchronisation latency. An inference endpoint that must respond to thousands of simultaneous user requests needs high throughput for individual connections and low tail latency for user experience, not the tightly synchronised all-to-all bandwidth that training clusters require. The network fabric that is optimal for training, with the highest possible aggregate bandwidth between a small number of tightly coupled nodes, is not the network fabric that is optimal for inference, which requires different routing, different load balancing, and different redundancy patterns. A facility designed solely for training will serve inference workloads with a network architecture that is unnecessarily expensive and operationally complex for the workload it is running.
The Power and Cooling Implications of Dual-Workload Design
The power delivery and cooling implications of serving training and inference simultaneously within a single facility compound the network design challenge. Training clusters generate sharp, dynamic load patterns as they cycle between compute-intensive training phases and checkpointing or evaluation phases. Sean James, Nvidia’s distinguished engineer for energy systems, described at Data Center World 2026 how training cluster load variations can be seen all the way back at the power plant, requiring energy storage to smooth those fluctuations, maintain power quality, and meet grid requirements such as ride-through during voltage events. Inference workloads generate more predictable, sustained loads that do not create the same grid-level volatility, but whose continuous nature means that cooling systems must be designed for sustained thermal load rather than the cyclical patterns that training clusters produce.Â
A facility that houses both training and inference workloads must design its power delivery, energy storage, and cooling systems for the envelope that covers both load profiles rather than optimising for either. That design envelope is wider and more expensive than either workload would require in isolation. The operators who are managing that cost most effectively are those who have been deliberate about the physical separation of training and inference infrastructure within their facilities, using different cooling architectures, different power distribution designs, and different network topologies for each workload type rather than trying to serve both from a single unified infrastructure design. The rack density threshold forcing a rethink of every data center standard documented how rapidly the density requirements of training-oriented GPU clusters are changing the fundamental engineering assumptions of data center design. The inference dimension adds a second set of requirements that do not move in the same direction as training requirements, creating the dual-optimisation problem that the next generation of data center design must solve.Â
What This Means for Site Selection and Development Economics
The dual-workload design requirement has direct implications for how data center sites are evaluated, how facilities are planned, and how the economics of AI data center development are modelled. A site that is adequate for inference deployment may not be adequate for training deployment because training clusters require power densities, cooling infrastructure, and network fabric specifications that exceed what inference workloads require. A site that is optimal for training may be over-engineered and therefore unnecessarily expensive for the inference workloads that will run alongside training within the same campus.
The operators who are navigating this challenge most effectively are those who design their campuses with explicit zones for training and inference from the earliest stages of site planning, allocating power capacity, cooling design, and network architecture to each zone based on the specific requirements of the workload type rather than applying a single design specification across the entire facility. That zoned design approach is more complex to plan and build than a uniform facility, but it produces better economics across the full workload lifecycle because each zone is optimised for its specific requirements rather than being over-built for the most demanding workload across the board. The data center design challenge of 2026 is not how to build a training facility or how to build an inference facility. It is how to build a campus that serves both simultaneously, at the density and performance levels that frontier AI requires, without the economic penalty of designing the entire campus to the most demanding specification of either workload type.
