NVIDIA recently announced the 2024 release of the NVIDIA HGX⢠H200 GPUâa new, supercharged addition to its leading AI computing platform. Gcore is excited about the announcement of the H200 GPU because we use the A100 and H100 GPUs to power up our AI GPU cloud infrastructure and look forward to adding the L40S GPUs to our AI GPU configurations in Q1-2024. So we consider this the right time to share a comparative analysis of the NVIDIA GPUs: the current generation A100 and H100, the new-generation L40S, and the forthcoming H200.
Comparison of A100 Vs. H100 Vs. L40S Vs. H200
The NVIDIA A100, H100, L40S, and H200 represent some of the most advanced and powerful GPUs in the companyâs lineup. Theyâre designed specifically for professional, enterprise, and data center applications, and they feature architectures and technologies optimized for computational tasks, AI, and data processing. Letâs see how they stack up against each other on key technical specifications.
Specification | A100 | H100 | L40S | H200 |
Architecture | Ampere | Hopper | Ada Lovelace | Hopper |
Release Year | 2020 | 2022 | 2023 | 2024 |
FP64 | 9.7 TFLOPS | 34 TFLOPS | Data not available | 34 TFLOPS |
FP64 Tensor Core | 19.5 TFLOPS | 67 TFLOPS | Data not available | 67 TFLOPS |
FP32 | 19.5 TFLOPS | 67 TFLOPS | 91.6 TFLOPS | 67 TFLOPS |
TF32 Tensor Core | 312 TFLOPS | 989 TFLOPS | 183 | 366* TFLOPS | 989 TFLOPS* |
BFLOAT16 Tensor Core | 624 TFLOPS | 1,979 TFLOPS | 362.05 | 733* TFLOPS | 1,979 TFLOPS* |
FP16 Tensor Core | 624 TFLOPS | 1,979 TFLOPS | 362.05 | 733* TFLOPS | 1,979 TFLOPS* |
FP8 Tensor Core | Not applicable | 3,958 TFLOPS | 733 | 1,466* TFLOPS | 3,958 TFLOPS* |
INT8 Tensor Core | 1248 TOPS | 3,958 TOPS | 733 | 1,466* TFLOPS | 3,958 TFLOPS* |
INT4 Tensor Core | Data not available | Data not available | 733 | 1,466* TFLOPS | Data not available |
GPU Memory | 80 GB HBM2e | 80 GB | 48GB GDDR6 with ECC | 141GB HBM3e |
GPU Memory Bandwidth | 2,039 Gbps | 3.35 Tbps | 864 Gbps | 4.8 Tbps |
Decoders | Not applicable | 7 NVDEC 7 JPEG | Not applicable | 7 NVDEC 7 JPEG |
Max Thermal Design Power (TDP) | 400W | Up to 700W (configurable) | 350W | Up to 700W (configurable) |
Multi-Instance GPUs | Up to 7 MIGs @ 10 GB | Up to 7 MIGs @ 10 GB each | No | Up to 7 MIGs @16.5 GB each |
Form Factor | SXM | SXM | 4.4â (H) x 10.5â (L), dual slot | SXM** |
Interconnect | NVLink: 600 GB/s PCIe Gen4: 64 GB/s | NVLink: 900GB/s PCIe Gen5: 128GB/s | PCIe Gen4 x16: 64GB/s bidirectional | NVIDIA NVLinkÂŽ: 900GB/s PCIe Gen5: 128GB/s |
Server Options | NVIDIA HGX⢠A100-Partner and NVIDIA-Certified Systems with 4,8, or 16 GPUs NVIDIA DGX⢠A100 with 8 GPUs | NVIDIA HGX H100 Partner and NVIDIA-Certified Systems⢠with 4 or 8 GPUs NVIDIA DGX H100 with 8 GPUs | Data not available | NVIDIA HGX⢠H200 partner and NVIDIA-Certified Systems⢠with 4 or 8 GPUs |
NVIDIA AI Enterprise | Included | Add-on | Data not available | Add-on |
CUDAÂŽ Cores | 6,912 | 16,896 | 18,176 | Data not available |
* With sparsity.
** Preliminary specification. May be subject to change.
Source: https://resources.nvidia.com/l/en-us-gpu
Based on the above comparison, weâre expecting the H200 to outperform the previous and current generation of NVIDIA data center GPUs across use cases. The current generationâthe H100âis a close match to the H200, with near identical multi-precision computing performance. So, while H200s will offer improvements, H100s will remain a top option. As for the A100, itâs the least-performant GPU when compared to its successors, while still offering solid performance for certain tasks.
The L40S differs from the A100 and H100 because it includes Third-Gen RT cores (142) with 212 TFLOPS of RT core performance and 568 Fourth-Gen Tensor Cores. However, we donât have sufficient information about these parameters for H200 yet, so it remains to be seen exactly how the L40S and H200 will stack up.
NVIDIA GPUs At A Glance
Letâs check out each GPU in turn to discover more about its features, performance, and the use cases where it shines.
NVIDIA A100
The NVIDIA A100 GPU was the first GPU to feature the Ampere architecture back in 2020. Prior to the release of H100 in 2022, the A100 was a leading GPU platform. It offered a substantial leap in performance compared to its predecessors thanks to improved Tensor cores for AI, increased CUDA core count for parallel processing, enhanced memory, and the fastest-ever memory bandwidth at 2 Tbps. It supports multi-instance GPU (MIG) that allows a single A100 GPU to be partitioned into smaller, independent GPUs to maximize resource allocation and efficiency in cloud and data center environments.
Despite being surpassed in performance by newer models, the A100 GPU remains a great choice for training complex neural networks as part of deep learning and AI learning tasks because of its powerful Tensor Cores and high computational throughput. It also shines at AI inference tasks such as speech recognition, image classification, recommendation systems, data analytics and big data processing, scientific computing and simulations, and high-performance computing (HPC) tasks including genome sequencing and drug discovery.
NVIDIA H100
The NVIDIA H100 GPU can handle the most demanding AI workloads and large-scale data processing tasks. H100 includes next-generation Tensor Cores, which dramatically enhance AI training and inference speeds. It also supports double precision (FP64), single precision (FP32), half precision (FP16), and integer (INT8) compute tasks.
The H100 offers a substantial performance boost over the A100, including the following benefits:
- Six times fasterâcapable of four petaflops for FP8
- 50% memory increaseâuses HBM3 high-bandwidth memory up to 3 Tbps with external connectivity nearly reaching 5 Tbps
- Up to six times faster model transformer training thanks to its new Transformer Engine
While H100 covers similar use cases and performance features as the A100, the H100 GPU can handle massive AI models, including those using transformer architectures and more complex scientific simulations. The H100 GPU is also a superior choice for real-time and responsive AI applications, like advanced conversational AI and real-time translations.
NVIDIA L40S
The L40S is one of NVIDIAâs most powerful GPUs, released in Q4 of 2023 (and joining Gcoreâs infrastructure right away.) Itâs designed to handle the next generation of data center workloads: generative AI, large language model (LLM) inference and training, 3D graphics, rendering, video, and scientific simulations.
NVIDIA L40S delivers up to 5x higher inference performance and up to 2x real-time ray-tracing (RT) performance compared to previous-generation GPUs, like the A100 and H100. The 48GB of GDDR6 memory with ECC (Error Correcting Code) plays a crucial role in maintaining data integrity in high-performance computing environments. It also comes equipped with over 18,000 CUDA coresâthe parallel processors that are key to handling complex computational tasks.
NVIDIA H200
The NVIDIA H200 is the latest in NVIDIAâs lineup of GPUs, scheduled to be shipped during Q2-2024. Itâs the first GPU to offer 141 GB of HBM3e memory at 4.8 Tbpsânearly double the capacity of the NVIDIA H100 Tensor Core GPU with 1.4x more memory bandwidth. The latter is relevant in high-performance computing and results in up to 110x faster time-to-results compared to CPUs. Inference speed is double that of H100 GPUs when handling Llama2 70B inference.
The H200 is set to play a critical role in Artificial Intelligence of Things (AIoT) for edge computing and IoT applications. You can also expect the highest available GPU performance from H200s across application workloads, including LLP training and inference for the largest models beyond 175 billion parameters, and in generative AI and HPC applications.
Conclusion
Based on the initial specifications and preliminary performance benchmarks, the NVIDIA HGX⢠H200 seems a significant step forward from A100 and H100 GPUs in terms of overall performance, energy savings, and TCO (total cost of ownership). We hope this comparative guide will help you to choose the right NVIDIA data center GPU as the ideal solution for solving your business problems in deep learning and AI, HPC, graphics, or virtualization in the data center or at the edge.
Gcore offers various AI GPU configurations for bare metal servers and virtual machines based on A100 and H100 GPUs. In addition, our Managed Kubernetes platform allows you to use bare metal servers and virtual machines with A100 and H100 GPUs as worker nodes. Weâll soon add more AI GPU configurations based on the latest L40S GPUsâstay tuned for updates!