VMware vSphere Upgrade Supports AI Workloads On-Prem with Nvidia GPUs

The new release makes way for virtualizing HPC servers running AI workloads.

Jeffrey Schwartz

March 12, 2021

4 Min Read
Virtual data center

A new version of the VMware vSphere server virtualization platform can now run on-premises artificial intelligence (AI) workloads. VMware’s vSphere 7 Update 2 supports servers with Nvidia’s A100 Tensor Core GPUs and its AI-Ready Enterprise platform.

VMware partnered with Nvidia to enable vSphere environments to support AI workloads cost-effectively and at scale. The companies announced the partnership last fall at VMware’s VMworld conference, a virtual event due to the pandemic. At VMworld, VMware and Nvidia CEOs described enabling vSphere to virtualize AI workloads as an opportunity to “democratize” AI.

AI workloads such as natural language processing, image recognition, predictive analytics and cognitive computing require high-performance compute infrastructure. That means those workloads must either run in public clouds, or on-premises with bare-metal servers with GPUs.

Depending on the use, running AI workloads in the cloud can be costly, especially if it results in moving data. Also, many organizations do not want to, or cannot host certain data in public clouds, particularly if it is sensitive.

Most organizations that do run AI workloads on-premises deploy them onto high-performance, bare-metal servers, which is also costly. Until now, the wide universe of organizations with VMware vSphere were unable to utilize their virtualized infrastructures for those workloads.

Impact on Enterprise AI

Lee Caswell, VP of marketing for VMware’s cloud business, said the new vSphere release will let more organizations use their virtual infrastructures to run AI workloads. Caswell predicts it will have the same effect on expanding AI as the original vSphere once had on server consolidation.


VMware’s Lee Caswell

“We’re bringing all of the value that VMware is known for to the AI user,” Caswell said in a recent briefing with analysts and media. “Increasingly, customers want to spend less time putting the pieces together and more time actually getting to it.”

Analyst Ashish Nadkarni, IDC’s group VP for infrastructure systems, platforms and technology, said the new vSphere release has that potential. VMware created vSphere because of the underutilization of server infrastructure, resulting in excessive compute costs to organizations.

“That problem is now recreated with AI infrastructure,” Nadkarni said. “Because when you have a server with six GPUs, if each GPU costs $10,000 a pop, you’re looking at a $60,000 investment for one server just to add those GPUs. And when you have AI workloads that cannot make use of virtualization, or AI workloads that cannot be virtualized because of the lack of support for the GPU, you’re getting the same underutilization and runaway costs for your AI infrastructure or accelerated compute infrastructure.”

Tested and Certified

To minimize the cost and complexity, VMware and Nvidia have performed validation and integration testing, Caswell said. Likewise, the hardware from Dell, HPE, Lenovo and Hitachi are all certified.

“We think it’s going to be incredibly powerful in reducing both real and perceived risk,” he said. “That’s really important when you go and drive new technologies in the mainstream.”

The new VMware vSphere release provides direct connection with Nvidia’s A100 Tensor Core GPUs and those servers certified by VMware.

“This is important for how we scale out, and that’s one of the things that you can get from an enterprise environment and a virtualized environment that goes past bare-metal servers,” Caswell said.

In addition to enabling the virtualization of servers with the GPUs, the Nvidia AI Enterprise will run on vSphere. Nvidia AI Enterprise is a framework and platform that allows AI solutions to run on VMware virtualized infrastructures.


Nvidia’s Justin Boitano

The new GPUs also support multi-instance GPU (MIG), which provides multiuser support. They also support vSphere vMotion for live migrations and with the vSphere Distributed Resource Scheduler (DRS), for automated workload management.

“For channel partners, the great thing about AI is it is strategic to every C-level executive at every enterprise in the world,” said Justin Boitano, general manager of Nvidia’s edge and enterprise computing group. “They aren’t asking why to do it anymore; they’re asking how to do it. I think this gives a very clear and easy blueprint for how to get started.”

Last month, VMware released vSAN 7 Update 2. With the vSphere update, the two have some shared capabilities. For example, vSphere Lifecycle Manager now supports management of vSphere with Tanzu clusters. Tanzu is VMware’s Kubernetes cluster management platform. VMware said the vSphere Lifecycle Manager also adds support for some Hitachi Vantara UCP servers.

Read more about:


About the Author(s)

Jeffrey Schwartz

Jeffrey Schwartz has covered the IT industry for nearly three decades, most recently as editor-in-chief of Redmond magazine and executive editor of Redmond Channel Partner. Prior to that, he held various editing and writing roles at CommunicationsWeek, InternetWeek and VARBusiness (now CRN) magazines, among other publications.

Free Newsletters for the Channel
Register for Your Free Newsletter Now

You May Also Like