A Neocloud Bets on Inference
At HPE Discover 2026 on June 17, cloud infrastructure provider Vultr said it had selected Hewlett Packard Enterprise and NVIDIA technology for a new wave of AI deployments, anchored on NVIDIA GB300 NVL72 systems supplied through the NVIDIA AI Computing by HPE portfolio. The deployments incorporate NVIDIA Spectrum-X Ethernet networking and HPE liquid cooling, the integrated package that has become table stakes for operators running dense, high power AI racks. Financial terms, timelines, and facility locations were not disclosed, but the strategic direction was unmistakable: Vultr is building for inference, not just training.
That distinction defines the moment. The first phase of the AI buildout was dominated by training, where hyperscalers and labs raced to assemble the largest possible clusters to push frontier models. The economics of training are brutal and bursty, demanding enormous capital for workloads that come and go. Inference, by contrast, is the steady state of AI in production: models answering queries, generating tokens, and running agents around the clock. For a neocloud like Vultr, inference promises more predictable utilization and, crucially, a clearer path to profitability.
The Inflection Point in AI Cloud Economics
Vultr chief executive J.J. Kardwell put the market shift at the center of his remarks. "When you reach a point in the market where you're seeing operating margins for public companies be favorably impacted, we're now seeing changes," he said, pointing to the maturing economics that are starting to reward AI cloud operators rather than just burning their capital. The comment matters because the neocloud sector has been dogged by skepticism about whether renting GPUs at scale can ever be a durable, profitable business, or whether it is a commodity race to the bottom underwritten by debt.
Ron Westfall, vice president at HyperFrame Research, framed the structural change. "We have reached the inflection point where inference is transforming from a secondary operational phase into a primary, long-term driver," he said. That reframing has consequences for infrastructure choices. Inference workloads reward low latency, efficient networking, and dense, well cooled hardware that can run continuously at high utilization. The GB300 NVL72 plus Spectrum-X plus liquid cooling combination Vultr chose is purpose built for exactly that profile, which is why it is becoming the reference stack for serious inference clouds.
HPE's Pitch to the Neoclouds
For HPE, the Vultr win is validation of a strategy that positions the company as the systems integrator of choice for AI native cloud providers. "Vultr represents a new generation of AI cloud providers, and the company's selection of HPE validates the importance of AI data center architectures," said HPE president and chief executive Antonio Neri. The framing is telling. HPE is not trying to compete with NVIDIA on silicon; it is selling the engineering around the silicon, the cooling, the integration, and the support that turn raw GPUs into a running cloud.
This is a deliberate channel play. The neoclouds, companies like Vultr that lack the decades of data center engineering muscle of an AWS or a Microsoft, need partners who can deliver liquid cooled, NVIDIA optimized systems at scale and stand behind them operationally. By packaging NVIDIA's hardware with its own cooling and services under the NVIDIA AI Computing by HPE banner, HPE is making itself the default on ramp for operators who want to deploy fast without building deep infrastructure expertise in house. The Vultr deal is a marquee reference for that motion.
What Enterprise Buyers Should Take Away
Enterprises rarely buy directly from a Vultr or an HPE for their entire AI stack, but the deal still signals something useful about the market they are buying into. The standardization around integrated stacks, NVIDIA compute, NVIDIA networking, and a vendor's cooling and services, means the AI cloud layer is consolidating around a recognizable blueprint. That reduces the risk that workloads built on one neocloud cannot be moved or replicated on another, and it makes pricing and performance easier to compare as the inference market commoditizes.
It also reinforces a planning assumption that CIOs should internalize: inference, not training, will dominate AI infrastructure spend over the coming years, and the providers best positioned are those optimizing for continuous, efficient, low latency serving. As more neoclouds adopt the same reference architectures, enterprises gain leverage and optionality. The Vultr announcement, modest in dollar terms and light on specifics, is a useful data point in a larger story: the AI cloud business is growing up, and the operators that survive will be the ones that made inference economics work.
The Road Ahead
The undisclosed details leave questions. Without timelines or locations, it is hard to gauge how quickly Vultr can bring the new capacity online or where it will sit relative to the demand centers it serves. Power and cooling remain the binding constraints across the industry, and even a well chosen stack cannot conjure megawatts or interconnection out of thin air. Execution, as always in this sector, will be the test.
Still, the direction is sound. By aligning with HPE and NVIDIA on a proven inference oriented architecture, Vultr is positioning itself for the phase of the AI market that looks most like a durable business. If Kardwell is right that operating margins are turning favorable, the operators who built for inference early will be the ones who capture that turn. For a sector that has spent two years chasing training capacity at any cost, a clear eyed bet on profitable inference is a notable, and welcome, sign of maturity.


