AMD responds to Nvidia with new benchmarks indicating that its MI300X chip has 30% higher performance than the H100, Nvidia had challenged AMD’s first performance tests

2023-12-18 19:07:05

AMD and Nvidia are competing for the title of the company that produces the best neural network accelerator chips (Neural Processing Unit (NPU). By launching its new MI300X AI accelerator, AMD has claimed that it can match or even surpass the performance of Nvidia’s H100 chip by up to 1.6x. But Nvidia didn’t like the comparison and responded with benchmarks indicating that its H100 chip performs significantly better than the MI300X when evaluated taking into account its optimizations. AMD has just released a new response suggesting that its MI300X chip has 30% higher performance than the H100.

MI300X vs H100: AMD and Nvidia defend the top AI power

A neural processing unit is a microprocessor specialized in accelerating machine learning algorithms, usually by operating on predictive models such as artificial neural networks (ANN) or random forests. ). It is also known as a neural processor or AI accelerator. These AI processors have seen real growth in recent years due to the ever-increasing computing needs of AI companies and the advent of large language models (LLM). Until now Nvidia has largely dominated the market.

However, the Santa Clara firm is increasingly being followed by its rival AMD. To narrow the gap a little further with Nvidia, AMD
at the beginning at least a new AI accelerator called Instinct MI300X. AMD CEO Lisa Su and colleagues showcased the MI300X’s prowess by comparing it to the inference performance of Nvidia’s H100 using Llama 2. According to the comparison, a single AMD server, consisting of eight MI300X, would be 1.6x faster than an H100 server. But Nvidia did not appreciate the comparison and denied it. In a blog post published in response to AMD’s benchmarks, Nvidia took issue with its rival’s results.

Contrary to AMD’s presentation, Nvidia claims that its H100 chip, when properly evaluated with optimized software, outperforms the MI300X by a substantial margin. Nvidia alleged that AMD failed to incorporate its optimizations when comparing with TensorRT-LLM. Developed by Nvidia, TensorRT-LLM is a toolbox for assembling optimized solutions to perform the inference of large language models. In its article, Nvidia’s response was to compare a single H100 to eight-way H100 GPUs running the Llama 2 70B cat model. The results obtained are surprising.

The results, obtained using software prior to AMD’s presentation, demonstrated twice as fast performance for a batch size of 1. Further still, applying the standard latency of 2.5 seconds used by AMD, Nvidia emerge as the clear leader, outperforming the MI300X by a staggering factor of 14. How is this possible? It’s simple. AMD did not use Nvidia’s software, which is optimized to improve performance on Nvidia hardware. The Santa Clara firm indicates that AMD used alternative software that does not support the Transformer engine of the H100 (Hopper) chip.

Although TensorRT-LLM is available for free on GitHub, AMD’s recent benchmarks have used alternative software that does not yet support Hopper’s Transformer engine and does not have these optimizations, Nvidia says. Additionally, AMD did not take advantage of the TensorRT-LLM software released by Nvidia in September, which doubles inference performance on LLMs, nor the Triton inference engine. Thus, the absence of TensorRT-LLM, Transformer Engine and Triton resulted in non-optimal performance. According to critics, since AMD does not have equivalent software, it thought this was a better measure.

AMD Releases New Metrics Showing MI300X Is Superior to H100

Surprisingly, AMD responded to Nvidia’s challenge with new performance measurements of its MI300X chip, demonstrating a 30% increase in performance over the H100 chip, even with a finely tuned software stack. Mirroring Nvidia’s testing conditions with TensorRT-LLM, AMD took a proactive approach by considering latency, a common factor in server workloads. AMD emphasized key points of its argument, notably highlighting the advantages of FP16 using vLLM compared to FP8, which is proprietary TensorRT-LLM.

AMD claimed that Nvidia used a selective set of inference workloads. The company also said that Nvidia used its own TensorRT-LLM on H100 rather than vLLM, a widely used open source method. Additionally, Nvidia used the vLLM FP16 performance data type on AMD while comparing its results with DGX-H100, which used the TensorRT-LLM with the FP8 data type to display these allegedly misinterpreted results. AMD emphasized that in its testing it used vLLM with the FP16 data set due to its widespread use, and that vLLM does not support FP8.

Another point of contention between the two companies concerns latency in server environments. AMD criticizes Nvidia for focusing solely on throughput performance without addressing real-world latency issues. So, to counter Nvidia’s testing method, AMD ran three benchmarks using Nvidia’s TensorRT-LLM toolkit, with the last test specifically measuring latency between MI300X and vLLM using the FP16 versus H100 dataset. with TensorRT-LLM. AMD’s new tests showed improved performance and reduced latency.

AMD applied additional optimizations, resulting in a 2.1x performance increase over H100 when running vLLM on both platforms. It is now up to Nvidia to evaluate how it wishes to react. But the company must also recognize that this would force the industry to abandon FP16 with TensorRT-LLM’s closed system to use FP8, which would mean abandoning vLLM for good.

The AI ​​hardware market is evolving very quickly and competition is intensifying

The competition between Nvidia and AMD has been going on for a long time. But it is interesting to note that this is the first time that Nvidia has decided to directly compare the performance of its products with those of AMD. This clearly shows that competition in this area is intensifying. In addition, the two flea giants are not the only ones trying to find a place in the market. Others, like Cerebras Systems and Intel, are also working on it. Intel CEO Pat Gelsinger announced the Gaudi3 AI chip at its latest AI Everywhere event. However, only very little information has been revealed about this processor.

Likewise, the H100 will soon no longer be relevant. Nvidia will present the GH200 chips at the beginning of next year, which will succeed the H100. AMD did not compare its new chips with the latter, but with the H100. It is obvious that the performance of the new GH200 chip will be higher than that of previous chips. Since the competition is so tight, AMD could end up being treated as a backup option by many companies, including Meta, Microsoft and Oracle. In this regard, Microsoft and Meta recently announced that they are considering integrating AMD chips into their data centers.

Gelsinger predicted that the GPU market size would be around $400 billion by 2027, so there is room for many competitors. For his part, Andrew Feldman, CEO of Cerebras, denounced alleged monopolistic practices by Nvidia during the Global AI Conclave event. We spend our time looking for ways to be better than Nvidia. “By next year, we will build 36 exaflops of computing power for AI,” he said of the company’s plans. Feldman is also reportedly in talks with the Indian government to power AI computing in the country.

The company also signed a $100 million deal for an AI supercomputer with G42, an AI startup in the United Arab Emirates, where Nvidia is not allowed to work. As for the tug-of-war between Nvidia and AMD, reports point out that the FLOP specs of the MI300X are better than the Nvidia H100 and the MI300X also has more HBM memory. However, it takes optimized software to run an AI chip and translate that power and bytes into value for the customer. AMD ROCm software has made significant progress, but AMD still has a long way to go, notes one reviewer.

Another welcomes the intensification of the rivalry between AMD and Nividia: it’s great to see AMD competing with Nvidia. Everyone will benefit, including probably Nvidia which cannot produce enough GPUs to satisfy market demand and is less inclined to rest on its laurels.

Sources : Nvidia, AMD

And you ?

What is your opinion on the subject?
What do you think of AMD’s MI300X and Nvidia’s H100 chips?
What comparisons do you make between the two AI accelerators?
In your opinion, will AMD’s MI300X chip succeed in establishing itself on the market?
Do you think that the H200 chip will allow Nvidia to once again distance itself from its rivals?
Will Cerebras and Intel be able to overshadow Nvidia in the GPU market in the near future?
What do you think about accusations that Nvidia uses antitrust practices to maintain its monopoly?

See as well

AMD Announces Instinct MI300, Generative AI Accelerators and Data Center APUs That Deliver Up to 1.3x Improved Performance in AI Workloads

Meta and Microsoft announce they will purchase AMD’s new AI chip to replace Nvidia’s

AMD acquires Nod.ai, an artificial intelligence software startup in an effort to strengthen its software capabilities and catch up with Nvidia

1702931963
#AMD #responds #Nvidia #benchmarks #indicating #MI300X #chip #higher #performance #H100 #Nvidia #challenged #AMDs #performance #tests

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.