Maximizing Memory Bandwidth: A Guide to High-Performance Processors and Servers

2023-04-25 05:30:51

If the performance of the applications running on memory bandwidth drops, you can build a higher-performance PC with good chip selection. How does expensive high-performance memory bandwidth affect application performance?

POWER10, a next-generation processor for the cloud announced by IBM in 2020, realizes large memory bandwidth. In 2019, IBM announced a server built with a Power 10-equipped machine equipped with OMI (Open Memory Interface) that supports multi-protocol as a high-speed interface. Intel says the IBM Power10 processor can accommodate a variety of technologies.

The Power10 realizes memory capacities from 256GB to 4TB with 320GB/sec bandwidth per core. In addition, in an optimized processor that reduces the number of memory modules by a quarter and realizes DDR4 capacity from 128GB to 512GB per core, the bandwidth can be increased to 800GB/sec by changing to DDR5 memory. In addition, the Power 10 processor called Cirrus has a maximum memory bandwidth of 256 GB/sec per core and a sustained memory bandwidth of 120 GB/sec per core.

Since the Power10 memory streaming is a dual-chip module, unlike other single-chip sockets, the clock speed can be adjusted to make it faster. The IBM Power E1050, a rack-type server released by IBM, is equipped with up to four Power10 dual-chip modules and 96 cores, DDR4 memory operating at an operating frequency of 3.2GHz, and supports up to 64 differential DIMMs, providing up to 1.6 TB/sec bandwidth can be realized.

In addition, it is said that not only can the bandwidth be doubled by reducing the number of cores, but also the memory bandwidth can be further expanded by switching to DDR5 memory or CXL (Compute Express Link) memory. Introducing the expensive IBM Power E1050 is by no means cheap shopping, but it is said to be a better choice than waiting for the release of high-performance chips for data centers that integrate CPU and GPU, such as AMD (Instinct MI300) or Nvidia (Grace Hopper). These chips have high memory bandwidth per core, but have limited memory capacity and can only do smaller programs than IBM Power E1050 with Power 10 or Sapphire Rapids announced by Intel. In addition, it is pointed out that the expected memory bandwidth may not be reached because AMD and Nvidia’s high-performance chips are prone to heat and have no choice but to lower DRAM and HBM speeds.

Intel Sapphire Rapid may be among the best CPU processors for building full memory bandwidth. Sapphire Rapid is a processor that can simultaneously support HBM2e memory and DDR5 memory with wide bandwidth. Some Sapphire Rapid products support multiple HBM2e memories, others support eight NUMA.

Related Articles:  the M2 Max chip is back in the news, and Apple walks through China

Usually, the Sapphire Rapid Xeon SP model has eight DDR5 memory channels, and the maximum capacity is 2TB when one DIMM per channel is used at an operating frequency of 4.8GHz. In addition, if two DIMMs are used per channel, the maximum capacity expands to 4TB, but the operating frequency is said to be 4.4GHz.

Since the 60-core Sapphire Rapid Xeon SP-8490H operates at an operating frequency of 1.9 GHz, the bandwidth per core is narrowed to 5.1 GB/sec. On the other hand, since the 16-core Sapphire Rapid Xeon SP-8444H operates at a high frequency of 2.9GHz, the bandwidth per core becomes 19.2GB/sec.

In addition, if you want to increase the memory bandwidth per core and change to the Sapphire Rapid Xeon SP-6434, the operating frequency increases to 3.7GHz and the bandwidth per core expands to 38.4GB/sec.

The Sapphire Rapid Max series CPU has 56 cores, 4 HBM2e stacks with 64GB memory capacity and 1.23TB/sec bandwidth, realizing 22GB/sec memory bandwidth per core. The other model operates with 32 cores with a bandwidth of 1.23 TB/sec, resulting in 38 GB/sec of memory bandwidth per core.

In addition, in the Sapphire Rapid Max series CPU, by adding DDR5 memory and CXL memory, it is possible to realize a high memory bandwidth of 13.912 TB/sec and 217.4 GB/sec per core. In addition, higher performance can be realized by interconnecting NUMA.

Sapphire Rapid is suitable not only for building servers that require high memory bandwidth, but also for high-performance computation and speeding up AI machine learning, but requires large costs. Therefore, the approach using Sapphire Rapids is not suitable for AI training.

In addition, it is said that the balance between GPU core and HBM memory bandwidth is important for proper use in chips such as AMD Instinct MI300 and Nvidia Gracehopper. Related information this placecan be found in

1682401304
#Ultrahigh #performance #server #memory #bandwidth #data #centers

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.