By Techtonic @ https://technokrax.com
In the high-stakes race to dominate AI infrastructure, Nvidia isn't just maintaining its lead—it's actively extending it. Just one year after introducing the groundbreaking Blackwell architecture at GTC 2024, Jensen Huang has unveiled the next evolution: Blackwell Ultra, a platform specifically designed to usher in what Nvidia calls "the age of AI reasoning."
The initial Blackwell architecture was already a technical marvel. Designed as the successor to Nvidia's venerable Hopper generation, it promised raw performance approximately 5x faster than its predecessors, with top-end configurations reaching 20 petaFLOPS. But this blistering speed came with an equally impressive power requirement.
Each Blackwell GPU actually comprises two reticle-limited compute dies connected via a 10TB/sec NVLink-HBI fabric, allowing them to function as a single accelerator. Flanked by eight HBM3e memory stacks providing up to 192GB of capacity and 8TB/sec of bandwidth, these chips set a new standard for performance—while also pushing the boundaries of data centre power and cooling capabilities.
The original Blackwell family included three main components: the B100, B200, and Grace-Blackwell Superchip (GB200). The B100 maintained compatibility with existing datacenter configurations by operating at 700W, the same as the previous H100. But the more powerful B200 pushed to 1,000W in air-cooled setups while delivering 18 petaFLOPS, and liquid-cooled variants could reach a staggering 1,200W while pumping out the full 20 petaFLOPS.
The most extreme configuration, the GB200 Superchip, combined a 72-core Grace CPU with two Blackwell GPUs, potentially consuming around 2,700W under peak load—a configuration that Nvidia wisely opted to offer only with liquid cooling.
Now, with Blackwell Ultra, Nvidia is taking things even further. As Jensen Huang puts it: "AI has made a giant leap—reasoning and agentic AI demand orders of magnitude more computing performance. We designed Blackwell Ultra for this moment—it's a single versatile platform that can easily and efficiently do pretraining, post-training and reasoning AI inference."
The flagship GB300 NVL72 rack-scale solution delivers 1.5x more AI performance than its predecessor, the GB200 NVL72 while increasing Blackwell's revenue opportunity by 50x for AI factories compared to those built with Hopper. This massive configuration connects 72 Blackwell Ultra GPUs and 36 Arm Neoverse-based Grace CPUs in a single rack, essentially functioning as one enormous GPU built specifically for test-time scaling.
For organizations not ready to commit to a full rack solution, Nvidia is also offering the HGX B300 NVL16. This system boasts 11x faster inference on large language models, 7x more computing, and 4x larger memory compared to the Hopper generation, making it ideal for complex workloads like AI reasoning.
What makes Blackwell Ultra particularly significant isn't just its raw computational muscle, but how it enables emerging AI paradigms that simply weren't feasible at scale before:
AI Reasoning: Blackwell Ultra allows AI models to access increased computing capacity to explore different solutions to problems and break down complex requests into multiple steps, resulting in higher-quality responses.
Agentic AI: These systems go beyond simply following instructions—they can reason, plan, and take autonomous actions to solve complex, multi-step problems.
Physical AI: The platform enables companies to generate synthetic, photorealistic videos in real time for training applications like robots and autonomous vehicles at scale.
To ensure these massive computational resources don't get bottlenecked, Blackwell Ultra systems integrate with Nvidia's Spectrum-X Ethernet and Quantum-X800 InfiniBand platforms, delivering 800Gb/s of data throughput for each GPU through ConnectX-8 SuperNICs. BlueField-3 DPUs are also featured, enabling multi-tenant networking, GPU compute elasticity, accelerated data access, and real-time cybersecurity threat detection.
This advanced scale-out networking is critical for handling AI reasoning models without performance degradation, particularly when deploying across thousands of GPUs.
Recognizing that hardware alone isn't enough, Nvidia has developed new software specifically for Blackwell Ultra. The Dynamo open-source inference framework scales up reasoning AI services, reducing response times and model serving costs by providing efficient solutions for scaling test-time computing.
This new inference-serving software is designed to maximize token revenue generation for AI factories deploying reasoning models. It orchestrates inference communication across thousands of GPUs and uses disaggregated serving to separate processing and generation phases on different GPUs, allowing each phase to be optimized independently.
Just as with the original Blackwell, industry adoption for Blackwell Ultra appears robust. Cloud service providers including AWS, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, along with GPU cloud providers CoreWeave, Crusoe, Lambda, and others, will be among the first to offer Blackwell Ultra-powered instances.
Server manufacturers including Cisco, Dell, HPE, Lenovo, and Supermicro are preparing to deliver systems based on Blackwell Ultra products. However, potential customers will need to be patient—Blackwell Ultra-based products aren't expected to be available until the second half of 2025.
As AI continues its rapid evolution from pattern recognition to reasoning and agentic capabilities, the computational demands are growing exponentially. Blackwell Ultra represents not just an incremental improvement to existing architectures, but a platform specifically designed for this new paradigm.
The sheer scale of these systems—with their massive power requirements and advanced cooling needs—underscores how fundamentally different AI infrastructure is becoming from traditional data centre designs. It's no longer just about who can build the fastest chips, but who can deliver the comprehensive infrastructure to support them.
For organizations looking to leverage the next generation of AI capabilities, the message is clear: prepare your data centres now, because the computational demands are only going to increase. The age of AI reasoning is upon us, and it's going to require unprecedented computational resources to unlock its full potential.