CoreWeave used the first day of June to claim a milestone that several hyperscalers have been chasing since CES. The company says it now has the first fully working Nvidia Vera Rubin NVL72 server rack online, delivered by Dell Technologies and built on the new PowerEdge XE9812 platform. Michael Dell echoed the announcement on LinkedIn, calling the system the first liquid cooled XE9812 ever shipped. According to DataCenterDynamics.
The hardware itself is the centerpiece of Nvidia's six chip Vera Rubin platform. A single NVL72 rack stitches together 36 Vera CPUs, the successor to Grace, and 72 Rubin GPUs, the successor to Blackwell. They are bound together by NVLink 6 switches, ConnectX 9 SuperNICs, BlueField 4 DPUs, and Spectrum 6 Ethernet. Everything is liquid cooled, and Nvidia's cable free tray design cuts physical install time from about two hours per node down to five minutes.
On the performance side, Nvidia is pitching 5x inference and 3.5x training improvements compared with Blackwell at the same rack power envelope. Those numbers are vendor numbers, and we should treat them as such until independent benchmarks land. Still, even if the real world gains arrive at half the headline figures, the implication for total cost of ownership is significant. A model that needs 2,000 Blackwell GPUs for a training run could plausibly use far fewer Rubin GPUs at lower energy draw per token.
There is a side note worth flagging. Microsoft told reporters in March that it was the first cloud to bring up a Vera Rubin NVL72 system for validation. CoreWeave's distinction is that its rack is described as a working production unit, not a lab validation rig. The line between those two states is thin, and we should expect both AWS and Google Cloud to push their own version of the milestone in the next several weeks.
For our planning purposes, the immediate question is supply allocation. Nvidia put Vera Rubin into full production in January, and CoreWeave's reputation rests on being first to install the newest silicon at scale. That positions the company well against the larger hyperscalers for any workload where availability matters more than ecosystem breadth. We have already seen Meta, OpenAI, and Snowflake commit multi billion dollar pre orders for next generation chips. Now the question is who gets racks installed first.
Power and cooling will decide the pace of the rollout. NVL72 is a 100 percent liquid cooled design, which means colocation footprints built for air cooled GPU pods need retrofits before they can host Rubin at scale. Siemens, Nvidia, and Fluence yesterday published a reference electrical and power architecture aimed at exactly this problem, signalling that the industry expects a wave of brownfield conversions through 2026 and 2027.
For CTOs and VP Engineering leaders, three actions stand out. First, ask your cloud account teams when Vera Rubin instances will be generally available in your home region, and what the pricing premium will look like compared with H200 and Blackwell capacity. Second, audit any committed use discounts you signed in 2025 to see whether they lock you into older silicon for longer than the workloads can justify. Third, if you run inference heavy products, model what a 3x to 5x efficiency improvement does to your unit economics so finance can plan around the transition.
There is also a competitive angle. Specialist GPU clouds like CoreWeave, Lambda, and Crusoe are racing to install Rubin first because that is their primary moat against AWS, Azure, and Google Cloud. If they win the early access game, customers that need the freshest silicon for frontier model work will keep paying premium rates outside the big three. If they lose it, the hyperscalers absorb the workloads at the next contract renewal. The next ninety days will tell us which way that goes.
We will keep watching for benchmark results, regional availability announcements, and the first signs of how AWS and Google respond. For now, CoreWeave gets the bragging rights, and Dell gets a marketing line that will appear in every enterprise pitch deck for the rest of the year.
One more thread to follow is the software stack. Rubin generation chips ship with updated CUDA, NCCL, and Triton libraries that promise better support for mixed precision inference and dynamic batching. Teams that have spent the last twelve months tuning Blackwell workloads will need to validate that their pipelines still produce the same outputs on Rubin, which is rarely a trivial exercise. Plan a parallel evaluation window with both generations available before committing production traffic. The teams that move first will also be the ones that catch driver issues early, which is worth more than the marginal performance gain at launch. Finally, watch for partner ecosystem announcements from Databricks, Snowflake, and the major MLOps vendors confirming Rubin support, because that timing usually tells us how stable the platform really is.



