This is the rumor of the day regarding the next Nvidia cards. These new leaks come from Kopte7kimi and talk regarding the block diagram of the architecture of the new generation of greens. An image of the block diagram of the GPU AD102 ‘Ada Lovelace’ will allow us to project ourselves on the performance of the next RTX 40.
RTX 40: an impressive spec sheet (if true)
For starters, the GPU Ada Lovelace AD102 will feature up to 12 GPCs (Graphics Processing Clusters). This is a 70% increase from at GA102 (the largest of the current range) which only has 7 GPC. Each GPU will consist of 6 TPCs and 2 SMs, which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores, which is also the same as the GA102 GPU. The real change is the FP32 and INT32 kernel configuration. Each sub-core will consist of 128 FP32 units, but combined FP32 + INT32 units will go up to 192. This is because FP32 units do not share the same sub-core as IN32 units. The 128 FP32 cores are separated from the 64 INT32 cores.
Cache should be another area where NVIDIA has gone all out over existing Ampere GPUs. Ada Lovelace GPUs will contain 192KB of L1 cache per SM, a 50% increase over Ampere. This amounts to a total of 4.5 MB of L1 cache on the top AD102 GPU. The L2 cache will be increased to 96 MB, a figure regularly mentioned in several leaks. That’s almost 16 times more compared to the Ampere GPU which only hosts 6MB of L2 cache. The cache will be shared on the GPU.
If the leaks are true, we have an exponential increase in L2 cache, which increases to a total of 96 Mo for’ AD102 . Regarding the ROPs, there would have been twice as many units on this architecture, 32 par GPC to be precise, which would give us a total of 384 ROP for a possible RTX 4090 once morest 112 for the RTX 3090… On paper it is monstrous.
But following this orgy of technical data, what gains can we really expect?
It is obviously still early to have a precise idea but if these elements are confirmed, the technical sheet shows a huge difference compared to Ampere. To summarize :
- X2 GPC (compared to Ampere)
- 50% more cores (compared to Ampere)
- 50% more L1 cache (compared to Ampere)
- 16x more L2 cache (compared to Ampere)
- X2 ROP (compared to Ampere)
- 4th Gen Tensor and 3rd Gen RT Cores
But what can we expect in terms of actual performance?
It is very difficult because we are missing a key piece of data: the operating frequency.
If we speculate a little on this subject, we manage to project ourselves on a power in FP32 of 90 TFLOPS, more than double that of the current GA102. However with the TFLOPS we can also have surprises. If they give an idea of a raw performance, they never allow to prejudge the results in “everyday” use. The leaked announcements of x2 to x2.2 compared to the RTX 30… There will obviously be a gain, it seems to be a big deal. But to decide beyond that, we will have to wait a little longer.