During Nvidia’s annual GPU Technology Conference, the company’s Jen-Hsun Huang takes the stage to present the big news. This year, it will also be the platform where Nvidia sheds light on the upcoming architecture Pascal, where the first to enter the spotlight is the Tesla P100 with the graphics circuit GP100.
Specifications: Nvidia Tesla P100
Tesla K40 | Tesla M40 | Tesla P100 | |
---|---|---|---|
Technical | 28nm TSMC | 28nm TSMC | 16nm TSMC |
Circuit | GK110 | GM200 | GP100 |
Circuit surface | 551 mm2 | 601 mm2 | 610 mm2 |
Transistors | 7.1 billion | 8 billion | 15.3 billion |
Architecture | Kepler | Maxwell | Pascal |
SM cluster | 15 st. | 24 st. | 56 st. |
CUDA cores | 2 880 st. | 3 072 st. | 3 584 st. |
Texture units | 240 st. | 192 st. | 224 st. |
Clock frequency | 745 MHz | 948 MHz | 1 328 MHz |
GPU Boost | 875 MHz | 1 114 MHz | 1 480 MHz |
Calculating power (SP, FP32) | 5 040 GFLOPS | 6 844 GFLOPS | 10 608 GFLOPS |
Calculating power (DP, FP64) | 1 680 GFLOPS | 214 GFLOPS | 5 304 GFLOPS |
Memory bus | 384-bit | 384-bit | 4 096-bit |
Minnestyp | GDDR5 | GDDR5 | HBM2 |
Memory amount | Up to 12 GB | Up to 24 GB | 16 GB |
Memory frequency | 6 000 MHz | 6 000 MHz | ~1 400 MHz |
Memory bandwidth | 288 GB/s | 288 GB/s | 720 GB/s |
TDP | 235 W | 250 W | 300 W |
The Tesla P100 computing card is the first with the Pascal architecture, which, thanks to a 16-nanometer technology from TSMC, houses no less than 15.3 billion transistors. This compares to 8 billion for the Maxwell-based GM200, and the size of the circuit itself is slightly larger at 610 mm2.
The massive number of transistors also includes plenty of computing power, where the card at single precision (FP32) is set to 10.6 TFLOPS and double precision (FP64) to 5.3 TFLOPS. The former is a 54 percent increase over the GM200, and could be seen as an indication of what to expect from the sequel to the consumer-oriented graphics card Geforce Titan X.
For further specifications, the GP100 has a full 60 Stream Multiprocessorclusters, each with 64 CUDA cores and 4 texture units. This gives a total of 3,840 CUDA cores and 240 texture units. For the Tesla P100, however, four clusters are deactivated, which reduces the number to 3,584 and 224 respectively.
The graphics circuit is also flanked by four HBM2 memory capsules with a total capacity of 16 GB. The bandwidth is set at a record high 720 GB / s, which, however, is lower than the 1 TB / s made possible by the memory technology. However, it is likely that Nvidia has turned down the clock frequencies to squeeze power consumption, and that other graphics cards with GP100 are equipped with faster HBM2.
Deliveries of the Nvidia Tesla P100 will begin shortly. It is not clear when the GP100 graphics circuit can be found in consumer graphics cards.