Ethernet Holds Its Own in Demanding AI Compute Environments
While InfiniBand is making gains, many enterprises are sticking with Ethernet for their data center switching and interconnectivity needs to run their AI workloads.
January 6, 2024
The wide-scale embracement of artificial intelligence (AI) in businesses of all types and sizes requires high-performance compute and infrastructure not normally found in enterprise data centers. On the compute side, many are turning to GPUs and other workload accelerators like Infrastructure Processing Units (IPUs), Data Processing Units (DPUs), and Compute Express Link (CLX) technology. On the infrastructure side, while some are considering or using InfiniBand, many are sticking with Ethernet.
That point was borne out in the most recent IDC Worldwide Quarterly Ethernet Switch Tracker. IDC found that sales of data center Ethernet switches grew 7.2 percent (year-on-year) for Q3 of 2023, the latest quarter reported to date.
Where do enterprises stand: Ethernet or InfiniBand?
It should be noted that InfiniBand also grew substantially in the last year. But many are still hesitant to use it in the enterprise. Why? The trade-offs with InfiniBand versus Ethernet are very clear.
InfiniBand has a long track record of meeting the needs of demanding workloads in national labs and academic supercomputer centers. On the plus side, InfiniBand delivers higher throughput, lower latency, scalability, quality of service features, and more compared to traditional Ethernet.
Perhaps the biggest change in the market that is making InfiniBand a more viable option for the enterprise has been NVIDIA’s efforts in the area. In 2019, the company announced it was going to acquire Mellanox, a pioneer in developing InfiniBand products. NVIDIA now offers InfiniBand solutions that include switches, DPUs, network interface cards, and more.
Even with such a major vendor offering solutions in the AI space, many enterprises still want to use Ethernet. Many found that InfiniBand costs more, brings complexity not found with Ethernet, often has interoperability problems working with existing infrastructure, and is only offered by a limited number of vendors. In many cases, enterprises do not have the staff and expertise to use and manage the technology.
Ethernet is evolving to meet AI’s demands
Training and running AI models is driving up data center internetworking requirements. Knowing that many enterprise users prefer to work with Ethernet, the vendor community is exploring ways to improve or optimize Ethernet for such environments and the workloads running in them.
One example is the recent formation of the Ultra Ethernet Consortium. The effort is a Joint Development Foundation project hosted by The Linux Foundation. Founding members of the group include AMD, Arista, Broadcom, Cisco, Eviden (an Atos Business), HPE, Intel, Meta, and Microsoft.
As we reported last August when the group was announced, the group seeks to build a complete Ethernet-based communication stack architecture for AI and high-performance computing (HPC) workloads.
Another effort includes the work by Cisco with its Silicon One G200 and G202 ASICs, which are new networking chips that support AI/ML workloads. Additionally, Broadcom and NVIDIA are working on higher-performing Ethernet solutions for AI workloads.
Related articles:
About the Author
You May Also Like