RDNA 2 vs. RDNA vs. GCN: IPC and CU scaling in the test
: Test |CUP | Specs |Config
How much do the RDNA 2 graphics cards RX 6900 XT and 6800 (XT) benefit from additional shader clusters compared to the Radeon RX 6700 XT with 40 CUs? And how much faster is 40 RDNA-2 CUs than 40 from RDNA or GCN? Extensive scaling and IPC comparisons provide interesting and sometimes even surprising answers.
With the Navi 22 GPU of the Radeon RX 6700 XT (test), AMD’s RDNA 2 architecture also made it into the performance segment this week. The smaller GPU with 40 CUs allows for some interesting tests and comparisons that have been difficult to do until now.
For example, the configuration of the Navi 22 GPU with 40 CUs allows the measurement of the performance difference between RDNA on Navi 10 (Radeon RX 5700 XT) and RDNA 2 (Radeon RX 6700 XT). In addition, it is even easier to examine how well the new GPU architecture scales with the many compute units on the Radeon RX 6800 XT (72 CUs) and the RX 6900 XT (80 CUs). And because the Hawaii GPU already offered the Radeon R9 390 based on GCN 40 CUs, a three-generation comparison is even possible.
So this article goes into detail. On the other hand, if you are more interested in the graphics cards themselves, you should pay a visit to the graphics card tests on ComputerBase, which deal extensively with all current and older products.
AMD RDNA 2 vs. RDNA: Have the CUs gotten faster?
RDNA 2 contains many improvements over RDNA, this is the only way to double the performance. However, this is only partly due to the compute units (“shaders”) themselves. According to AMD, they have only become faster indirectly because, for example, they are supplied with data from the “Infinity Cache” with lower latency.
However, this could not be checked directly with the RDNA 2 graphics cards or the Navi 21 GPU presented so far, because the new generation offered at least 60 CUs, the old only 40 CUs. With the Radeon RX 6700 XT, however, this has now changed: Navi 22 (RDNA 2) has 40 compute units with a total of 2,560 shader units – just like Navi 10 (RDNA). With the same clock, both GPUs have the same theoretical computing power, so that RDNA 2 can be compared very precisely with RDNA.
Of course, there are a few other differences between the architectures, and a clock that has been heavily adjusted for direct comparison may also mean that the internal tuning of the GPU components no longer works perfectly. The comparison has never been as precise as in this article – and it shouldn’t be more precise outside of AMD’s laboratories.
What doesn’t quite fit next to the GPU: Navi 10 has a 256-bit interface, but Navi 22 has a 192-bit interface with “Infinity Cache”.
GCN is also involved
However, the series of tests can be made a little more interesting since the old Hawaii GPU in the form of the Radeon R9 390 also has 40 compute units. And with that it is also possible to include the second GCN iteration in the comparison. This also has the same memory bandwidth as the Radeon RX 6700 XT, although of course no “Infinity Cache”. So in terms of memory bandwidth, Graphics Core Next is a bit at a disadvantage in this comparison, but ultimately the cache of the RDNA 2 GPUs is also an advantage of the architecture.
Because Hawaii does not allow high clock rates, the RDNA graphics cards had to be massively reduced in clock speed for the comparison: 1,000 MHz is the common denominator.
The good news, however, is that on the RDNA GPUs (Navi 22 with RDNA 2 comes from 2.5 GHz and above) the telemetry is not messed up, so that the frame times are still displayed correctly. In addition, the reduced GPU clock can actually rule out that the higher bandwidth of Navi 10 still plays a role in any way. Because the GPU clock is almost halved, less bandwidth is required to fully utilize the units. The benchmarks are created in 1,920 × 1,080 with maximum graphic details.
GCN vs. RDNA vs. RDNA 2: Benchmarks in Full HD
RDNA makes a huge leap in performance per compute unit compared to GCN, which is not surprising given the age of the base architecture (2011). Although there are still slightly faster GCN offshoots than the second generation used here (the 4th generation “Polaris” was the fastest, “Vega” is slower), the difference is not big anyway.
And so, with the same processing power, RDNA is on average a whopping 41 percent faster in average FPS and 43 percent in percentile FPS than GCN. In the individual games, however, the differences vary significantly. Assassin’s Creed Valhalla, for example, seems to like GCN, where RDNA is “only” 27 percent more powerful. The opposite is true in Horizon Zero Dawn, where RDNA is 50 percent ahead, while it is 49 percent in Cyberpunk 2077 and Doom Eternal, and 48 percent in Dirt 5, Serious Sam 4 and Watch Dogs: Legion.
RDNA 2 clocks significantly higher than RDNA and achieves a not insignificant part of its additional performance from it. In order to reach the higher frequencies, the pipelines within the ALUs were lengthened, among other things, which increases the latencies and thus reduces the computing power per CU with RDNA 2 compared to RDNA, despite actually having the same capabilities for latency-sensitive calculations.
RDNA 2 has slowed down a bit
You can see that in the tests. So RDNA 2 is on average around 4 and 5 percent slower than RDNA with the same computing power. At the same time, this also shows that the longer pipelines for RDNA 2 were a good decision. In practice, this 4 percent reduction in computing power is offset by a clock rate that is around 24 percent higher for Navi 21 (RX 6800 XT) and a clock rate that is around 40 percent higher for Navi 22 (RX 6700 XT) – a good exchange: IPC for frequency.
In the worst case, RDNA 2 is sometimes 9 to 11 percent slower in the individual games, as Horizon Zero Dawn shows. But there are also games like Cyberpunk 2077, where RDNA 2 works absolutely as fast as RDNA – apparently the higher latencies don’t play a role there. And in Borderlands 3, even RDNA 2 is 4 and 5 percent faster, in Control it’s 4 and 7 percent. The “Infinity Cache” might help there, but this seems unlikely due to the massive manual reduction in computing power and thus the lower bandwidth requirements. This is probably due to some of RDNA 2’s other improvements.
RDNA vs. RDNA 2 in WQHD
A compute unit on RDNA is therefore faster in games than a CU on RDNA 2 if the influence of the memory bandwidth is reduced as much as possible. This takes away one of RDNA 2’s greatest strengths: the massively increased bandwidth through the “Infinity Cache”. So does the duel still end the same if the bandwidth influence is increased?
The comparison cannot be made entirely fair, RDNA 2 remains at a disadvantage, because RDNA cannot be clocked that high. In the following test series, the Radeon RX 5700 XT operates at 2,000 MHz, which is pretty close to the maximum possible with Navi 10. The computing power has therefore been doubled compared to the previous test series, and the shader units have to be supplied with data correspondingly faster.
The Radeon RX 6700 XT is meanwhile clocked down to 2,000 MHz. The graphics card actually clocks at an additional 500 MHz, which requires even more bandwidth, but cannot even be reached with Navi 10. At the same time, the resolution has been increased to 2,560 × 1,440. This not only corresponds to the actual area of application of the Radeon RX 6700 XT, but more pixels also require more memory bandwidth.
The Infinity Cache rotates the image
As memory bandwidth requirements increase, the picture rotates. Because with the double GPU clock, Navi 22 now works on average 2 and 3 percent faster than Navi 10 in 2,560 × 1,440. Here the “Infinity Cache” from RDNA 2 can flex its muscles and supply the ALUs with data quickly enough while RDNA despite higher memory bandwidth with classic VRAM, but without a large cache, which apparently does not make it.
The advantage of RDNA 2 in Control, which runs 11 and 10 percent faster than with the original RDNA, is particularly great. RDNA 2 is also clearly ahead in Borderlands 3 with 5 and 4 percent, in Star Wars: Squadrons it is 6 and 8 percent.
But there are also games in which RDNA apparently manages to use the ALUs to capacity. For example, in F1 2020 RDNA is 3 percent faster than RDNA 2, in Serious Sam 4 it is 2 percent. There is a tie in Assassin’s Creed Valhalla, Call of Duty: Black Ops Cold War, Dirt 5 and Hitman 3.