Nvidia is said to be redefining the number of CUDA cores in the Geforce RTX 3000

During Nvidia’s major Geforce event, the Geforce RTX 3000 series was unveiled and the performance gains the company believes the Ampere architecture brings. The company did not mention specific specifications during the video stream itself, but when details were published on the series’ product page, many dropped their chin over the number of specified CUDA cores, including SweClockers editors.

► Ampere is here – Geforce RTX 3080 doubles the performance of RTX 2080

In the new lineup, the Geforce RTX 3090, according to Nvidia’s information, is equipped with a slightly crazy 10,496 CUDA cores, which takes the step down to the still solid 8,704 and 5,888 pieces for the RTX 3080 and RTX 3070, respectively. Compared to the previous generation, it represents a sham shot RTX 2080 Ti, RTX 2080 and RTX 2070 possess 4,352, 2,944 and 2,304 CUDA cores. The reason for the big step may be that Nvidia does not calculate the CUDA capacity the same for the RTX 3000 series.

Official data from Nvidia itself is lacking, but a number of analyzes indicate that the capacity of the graphics card calculation units (SM) is not on a par with its counterparts in the Turing architecture. The TU102 Turing circuit is equipped with two floating point units for 64-bit calculations (FP64) per calculation unit (SM), giving a total of 144 FP64 units for TU102. During the presentation of the RTX 3000 cards, Nvidia suggests that the SM units and their CUDA resources work in a different way than before.

The SM units in Ampere can perform both double floating point calculations (FP + FP) or floating point and integer calculations (FP + INT) per clock cycle, instead of only FP + INT which is the case in Turing. A speculative explanation for this is that Nvidia has doubled the number of FP32 devices in each SM, and simply replaced the FP64 devices with these. As the latter are more complex than their FP32 counterparts, space is freed up for a large increase in FP32 units, and this may be the explanation for the large number of specified CUDA cores.

In the forum, the constant technology virtuoso @Yoshman shares a character-typical insightful analysis of how the new approach in the RTX 3000 family can affect performance, and performance in games specifically:

One of the news in Turing was to be able to ride FP + INT at the same time on the same bike. Just as GN speculates, it is very likely that Ampere can either run FP + FP (so double the FP32 performance) or FP + INT (identical to Turing, which halves the FP32 capacity to be the same as Turing calculated per SM and MHz) the same cycle .

And is also convinced that this is a very good design decision as what you do on graphics cards very rarely has an exact 1: 1 distribution between FP32 + INT (may be the case in some GPGPU cases, but is not optimal for gaming performance), which in practice means that Turing has too much INT capacity if you look at the particular gaming case.

By enabling FP + FP or FP + INT, it is now possible to dynamically switch between 100% FP32 + 0% INT to 50% FP32 + 50% INT depending on what is required at the moment. Modern games often seem to be at 20 – 40 maybe 50% INT share, which means that set against Turing (calculated per CUDA core) will be an actual efficiency on Ampere somewhere between x0.5 (at 50% INT) to x1.0 ( at 0% INT).

However, an SM consists of more than just units for integer and floating point calculations. The layout for other SM components such as register unit, scheduler, load / store and function units (SFU) seems to be the same in Ampere as in Turing, even though the exact layout is not known at the time of writing. However, if the layout is the same, it does not mean that 10 496 CUDA cores in the Geforce RTX 3090 must be a “beautified” definition on the part of Nvidia, at least not completely.

Read This Now:   Nvidia unveils upgraded Geforce GTX 1050 with 3 GB of graphics memory

A CUDA core is basically a relatively simple counting unit (ALU) that will perform addition calculations (Fused Multiply-Add, FMA). If the capacity for FP64 calculations has been sacrificed in favor of doubled FP32 capacity, it still means that the capacity has increased. What it affects is the extent to which an SM and its CUDA cores can be utilized and in which situations they can be fully utilized. If other components in addition to CUDA cores have not been doubled, this also means that the efficiency of Ampere’s SM units does not fully correspond to the Turing variants.

Several partner manufacturers state the number of CUDA cores in Geforce RTX 3090 to 5,248, ie half of what Nvidia states. It is unclear whether it should be seen as indicating what they see as the “real” number of CUDA cores, or whether it is simply a relic from when product information was sent to stores prior to the Geforce event. Product launches include relatively long lead times where older tasks do not always have to match what is presented in the end.

Read This Now:   Nvidia visar Titan X Collector's Edition

How big the differences between Ampere and Turing are in terms of SM units and CUDA cores remains to be seen when Nvidia presents an in-depth review of the Ampere architecture to consumers. Then it is also possible to make a more concrete statement about Nvidia’s statement whether CUDA capacity in the Geforce RTX 3000 is embellished or not, and how it may affect performance in practical scenarios.

Read more about Nvidia “Ampere”:


Notice: ob_end_flush(): failed to send buffer of zlib output compression (1) in /home/gamefeve/bitcoinminershashrate.com/wp-includes/functions.php on line 5420

Notice: ob_end_flush(): failed to send buffer of zlib output compression (1) in /home/gamefeve/bitcoinminershashrate.com/wp-includes/functions.php on line 5420