AD102, AD103 and AD104: transistors, number of cores and more on the GPUs of the early GeForce RTX 4000

NVIDIA announced early on that the first three video cards of the GeForce RTX 4000 era are based on three distinct GPUs: AD102 for the RTX 4090, AD103 for the RTX 4080 16 GB and AD104 for the RTX 4080 12 GB.

However, the company had not shared many technical details, especially with regard to AD103 and AD104, leaving us with a poorly defined technical framework. Now, thanks to sharing a whitepaper, we have one complete view of the characteristics of the GPUs and their implementation in the new cards. Let’s go in order.

La GPU AD102 is the heart of the GeForce RTX 4090 and, as we have already learned in the past few hours, it is a chip from 76.3 billion transistors crammed into an area of 608.5 mm2. La GPU ha 144 Streaming Multiprocessor, 18432 CUDA core, 144 RT cores and 576 Tensor cores. Also on board 192 ROPs and well 96 MB of L2 cache. The memory interface is 192 bits.

The GeForce RTX 4090 does not implement the GPU in full form, however, as we were able to explain in a news in which we hypothesized the future arrival of a 4090 Ti. The RTX 4090 provides 16384 CUDA cores active as a result of 128 working SMs, 128 RT cores and 512 Tensor cores. The data of the ROPs is equal to 176 units, while the bus remains at 384 bits.

Read This Now: Intel Xe, 512 EU and TDP GPUs up to 500W for datacenter variants rumor

The GPU AD103 provides instead 45.9 billion transistors in an area of 378.6 mm2. The chip provides 80 Streaming Multi processors for a total of 10240 CUDA cores, 80 RT core and 320 Tensor core. The provides 112 ROPs and delivers 64MB L2 cache. The memory interface is 256 bits.

There is no fully active AD103 on board the GeForce RTX 4080 16 GBin fact the number of MS drops to 76 leading to the following specifications: 9728 CUDA core, 76 RT core and 304 Tensor core. The ROPs are present in a number equal to 112.

Finally, we come to AD104GPU that according to what we learn is integrated into the GeForce RTX 4080 12 GB in its full form. The chip has an area of 294.5 mm2 and integrates 35.8 billion transistors. On board there are 60 SM for 7680 CUDA core, 60 RT cores and 240 Tensor cores. The L2 cache is equal to 64 MB, while the ROPs are 80. The memory interface is 192 bit. We summarize in the following table the specifications of the Ada Lovelace GPUs, along with the GA102 for reference:

Read This Now: Elden Ring in the technology test - ComputerBase: Test |CUP | Specs |Config

	AD102	AD103	AD104	GA102
Architecture	There’s Lovelace	There’s Lovelace	There’s Lovelace	Ampere
Productive process	TSMC 4N	TSMC 4N	TSMC 4N	Samsung 8N
Transistor	76.3 billion	45.9 billion	35.8 billion	28.3 billion
Die size	608,5 mm²	378,6 mm²	294,5 mm²	628,4 mm²
Streaming Multiprocessor	144	80	60	84
CUDA Core	18432	10240	7680	10752
Tensor Core	576	320	240	336
RT Core	144	80	60	84
ROPs	192	112	80	112
Cache L2	96 MB	64 MB	48 MB	6 MB
Bus	384 bit	256 bit	192 bit	384 bit

Putting aside the number of cores, it is evident that NVIDIA has decided to follow AMD on expanding the cache inside the chips. AMD with RDNA 2 has implemented an Infinity Cache up to 128 MB, a memory that has allowed it to maintain a reduced memory interface but still guarantee a high overall bandwidth.

NVIDIA does the same thing with Ada Lovelace, albeit in a different way, with up to 96MB of L2 cache that stands out from the 6MB L2 cache found in the top-of-the-line GA102 RTX 3000 series GPU. L2 cache NVIDIA has decided to sacrifice the NVLink interconnect.

Read This Now: ASUS ROG Strix GA35-G35DX: top gaming PC with Ryzen 9 3950X and RTX 2080 Ti

Finally, it is good to remember that all these GPUs are manufactured at TSMC with a process called 4N which is not to be confused with TSMC’s own N4. 4N is to be seen as a ‘optimization for NVIDIA of the Taiwanese company’s N5 (5 nm) process.