AMD Instinct MI100 is the first computing card with the architecture CDNA

For many years, AMD tried to get the graphics architecture Graphics Core Next (GCN) to fulfill dual roles, both for consumer cards and as a computing card for general calculations in workstations and data centers. To appease both sides with the same base plate was not optimal, and in 2019 AMD announced that the company will change tactics and develop two separate architectures.

Contents hide

2 Related posts:

The result is the Radeon DNA (RDNA) architecture for consumers and Compute DNA (CDNA) for workstations and data centers. With CDNA, AMD tears out all graphics features and properties that are not needed for more general calculations, and the architecture is tailored instead of tasks related to machine learning and artificial intelligence. The first graphics card with the CDNA under the hood will be the Instinct M100, which according to AMD delivers world-leading performance for floating point calculations.

More specifically, AMD states leading performance for single precision (FP32) and dual precision (FP64) floating point numbers. For FP32, the company states a maximum theoretical computing capacity of 11.5 TFLOPS and FP64 performance of a maximum of 23.1 TFLOPS. To make Instinct MI100 flexible, the Matrix Core Technology technology is also added, which enables mixed precision execution where integer variants such as INT8 and INT4 can be mixed with floating point numbers such as FP16, FP32 and Bfloat16.

Read This Now: Video: AMD launches Radeon-themed uprising

In terms of matrix compression (dense), AMD indicates a significant advantage in floating point performance over the Nvidia A100, with 19.5 percent better results for double precision (FP64) and 18.5 percent advantage for single precision (FP32). The AMD camp, however, gets to see itself properly stern sailed by the A100 in terms of half-precision (FP16), where Nvidia’s card reigns supreme with 69 percent better results.

The benefits for Nvidia continue when it comes to calculations of sparse matrices (sparse). For FP64, the A100 here reaches 19.5 TFLOPS while the Instinct MI100 has to settle for 11.5 FTLOPS. For the FP32, the result is the same 19.5 TFLOPS in terms of standard single precision, but stands up to 156 TFLOPS when Nvidia’s Tensor Float technology is used, which trumps AMD’s 23.1 TFLOPS. Equally overwhelming is the integer result INT8, where the A100 reaches 624 TFLOPS in the green camp versus 185 TFLOPS for the red team.

Calculation type	AMD Instinct MI100	Nvidia A100
FP32 (dense)	11,5 TFLOPS	9,7 TFLOPS
FP32 (sparse)	23,1 TFLOPS	19,5 TFLOPS 156 TFLOPS (Tensor Float)
FP64 (dense)	23,1 TFLOPS	19,5 TFLOPS
INT8	185 TFLOPS	624 TFLOPS

Read This Now: Deep dive into the history of graphics cards - Nvidia and ATI are taking over the market

When AMD compares with Nvidia’s older Volta V100 cards, the company states that the Instinct MI100 achieves between 1.4 to 3 times better results in workloads for high-performance data center applications. Compared to AMD’s predecessor Instinct MI60, the newcomer MI100 is said to provide a 40 percent increase in performance in the PIConGPU workload. AMD also states that the Instinct MI100 offers doubled performance per dollar compared to the Nvidia A100, without specifying the price of its own card.

The CDNA-based graphics circuit in Instinct MI100 has gone by the code name “Arcturus” and is manufactured similar to AMD’s other product range on TSMC’s 7-nanometer technology. The circuit houses 120 computing units (CU) for a total of 7,680 stream processors. In terms of graphics memory, this consists of 32 GB HBM2 which runs at the memory frequency 1.2 GHz. AMD states memory bandwidth of 1.23 TB / s when calculations are made within the graphics card, to compare with the HBM2E memory Nvidia applies in the A100 where bandwidth measures in at 1.54 TB / s.

When communicating with the rest of the system, this is done via a PCI Express 4.0 connection with a theoretical bandwidth ceiling of 64 GB / s. Several graphics cards can be connected in series via communication technology Infinity Fabric, which provides interconnected graphics cards with a bandwidth of 276 GB / s. In terms of performance, the AMD Instinct MI100 compares partly with Nvidia’s older Volta V100 and the latest cut in the form of the Ampere-based A100.

Read This Now: Nvidia Ampere can get quadruple ray tracing performance

The AMD Instinct MI100 will be made available to partner manufacturers (OEMs) and data center system builders by the end of 2020, and in terms of processors, AMD uses unsurprising models in the company’s Epyc family. The use of Epyc processors is something AMD has in common with Nvidia, which is upgrading the A100 graphics card in the battle for the data center and workstations.