In 2022, NVIDIA released the H100, marking a significant addition to their GPU lineup. Designed to both complement and compete with the A100 model, the H100 received an upgrade in 2023, boosting its VRAM to 80GB to match the A100’s capacity. Both GPUs are highly capable, particularly for computation-intensive tasks like machine learning and scientific calculations. However, there is a notable difference in their costs. This article will provide a detailed comparison of the H100 and A100, focusing on their performance metrics and suitability for specific use cases so you can decide which is best for you.
What are the Performance Differences Between A100 and H100?
According to benchmarks by NVIDIA and independent parties, the H100 offers double the computation speed of the A100. This performance boost has two major implications:
- Engineering teams can iterate faster if workloads take half the time to complete.
- Even though the H100 costs about twice as much as the A100, the overall expenditure via a cloud model could be similar if the H100 completes tasks in half the time because the H100’s price is balanced by its processing time.
To compare the A100 and H100, we need to first understand what the claim of “at least double” the performance means. Then, we’ll discuss how it’s relevant to specific use cases, and finally, turn to whether you should pick the A100 or H100 for your GPU workloads.
Interpreting NVIDIA’s Benchmarks
Let’s start by looking at NVIDIA’s own benchmark results, which you can see in Figure 1. They compare the H100 directly with the A100.
The benchmarks comparing the H100 and A100 are based on artificial scenarios, focusing on raw computing performance or throughput without considering specific real-world applications. In reality, different data formats may experience varying levels of speed improvements, so it’s essential to work with your engineering team or software vendor to determine how your specific workload might benefit from the H100’s enhancements.
The charts in Figure 2 show a practical example of training GPT-3 with an A100 compared to an H100.
In this example, clusters equipped with the A100 and H100 were used to train two LLMs (large language models.) The results showed notable speed improvements, especially when the software was optimized for the H100, such as by using the FP8 data format. However, the standout feature was the new NVLink Switch System, which enabled the H100 cluster to train these models up to nine times faster than the A100 cluster. This significant boost suggests that the H100’s advanced scaling capabilities could make training larger LLMs feasible for organizations previously limited by time constraints.
These numbers are impressive, but they come from NVIDIA, which has a vested interest in promoting its latest (and more expensive) GPU. To get a complete picture, we should also look at what independent sources say.
What Independent Benchmarks Reveal
NVIDIA sells GPUs, so they want them to look as good as possible. The GPT-3 training example above is impressive and likely accurate, but the amount of time spent optimizing the training software for these data formats is unknown. That’s why checking what independent sources say is always a good idea—you’ll get a better idea of how the comparison applies in a real-life, out-of-the-box scenario.
MosaicML compared the training of multiple LLMs on A100 and H100 instances. MosaicML is a managed LLM training and inference service; they don’t sell GPUs but rather a service, so they don’t care which GPU runs their workload as long as it is cost-effective. That means they have every reason to run realistic test cases, and therefore their benchmarks could be more directly transferrable than than NVIDIA’s own.
Table 1 shows the results for the different models.
Model | Optimized for H100 | Speedup over A100 |
1B | No | 2.2x |
1B | Yes | 2.7x |
3B | No | 2.2x |
3B | Yes | 2.8x |
7B | Yes | 3.0x |
30B | Yes | 3.3x |
The smaller, unoptimized models achieved a respectable 2.2x speedup on the H100. However, the larger models that were optimized for the H100 showed more significant gains. Notably, the 30B model experienced a 3.3x increase in speed compared to the A100.
Lambda Labs also released an LLM training benchmark for the H100, showing at least doubled performance compared to the A100. It’s worth noting that Lambda Labs is a cloud provider that wants to rent out the newest hardware.
While these numbers aren’t as impressive as NVIDIA claims, they suggest that you can get a speedup of two times using the H100 compared to the A100, without investing in extra engineering hours for optimization. If your goal is to increase the size of your LLMs, and you have an engineering team ready to optimize your code base, you can get even more performance from an H100.
What Does the H100 Offer that the A100 Doesn’t?
The H100 introduces a new chip design and several additional features, setting it apart from its predecessor. Let’s explore these updates to assess whether your use case requires the new model.
Confidential Computing
An exciting new feature for privacy is the confidential computing (CC) environment. In addition to data encryption at rest (i.e., on a hard drive) and data encryption in transit (i.e., on a network,) CC allows data encryption in use. If you’re handling private or confidential information and security compliance is of concern—like in the healthcare and financial industries—the H100’s CC feature could make it the preferred choice.
Tensor Memory Accelerator
The Tensor Memory Accelerator (TMA) is a new part of the H100 Hopper architecture that frees GPU threads from memory management tasks. The introduction of the TMA primarily enhances performance, representing a significant architectural shift rather than just an incremental improvement like adding more cores.
With the ever-increasing volume of training data required for reliable models, the TMA’s capability to seamlessly transfer large data sets without overloading the computation threads could prove to be a crucial advantage, especially as training software begins to fully use this feature. The H100 may prove itself to be a more futureproof option and a superior choice for large-scale AI model training thanks to its TMA.
How Much More Does the H100 Cost?
The H100 ismore expensive than the A100. Let’s look at a comparable on-demand pricing example created with the Gcore pricing calculator to see what this means in practice.
Specs | A100 Server | H100 Server |
CPUs | 2x Intel Xeon 8468 | 2x Intel Xeon 8468 |
Memory | 2TB | 2TB |
Block storage | 8x 3.84TB NVMe | 8x 3.84 TB NVMe |
GPUs | 8x NVIDIA A100 80GB 800Gbit/s Infiniband | 8x NVIDIA H100 80GB 3200Gbit/s Infiniband |
Cost | 16.483 €/h | 30.013 €/h |
The H100 is 82% more expensive than the A100: less than double the price. However, considering that billing is based on the duration of workload operation, an H100—which is between two and nine times faster than an A100—could significantly lower costs if your workload is effectively optimized for the H100.
Should I Pick the A100 or the H100?
Picking the right GPU clearly isn’t simple. Here are the factors you need to consider when making a choice.
Cost Efficiency
While the A100 typically costs about half as much to rent from a cloud provider compared to the H100, this difference may be offset if the H100 can complete your workload in half the time. Consult with your engineers or vendors to ensure that your specific GPU software won’t suffer any performance regressions, which could negate the cost benefits of the speedups.
Licensing Costs
The software you plan to use with the GPUs has licensing terms that bind it to a specific GPU model. Licensing for software compatible with the A100 can be considerably less expensive than for the H100.
Use Cases
The H100 is NVIDIA’s first GPU specifically optimized for machine learning, while the A100 offers more versatility, handling a broader range of tasks like data analytics effectively. If your primary focus is on training large language models, the H100 is likely to be the most cost-effective choice. If it’s anything other than LLMs, the A100 is worth serious consideration.
Power Consumption
For on-premises operations, bear in mind the H100’s higher power consumption: up to 700W, compared to the A100’s 400W maximum. Increased performance comes with higher energy demands and heat output, so ensure your infrastructure can support such requirements if you’re considering buying GPUs outright.
Availability
Not all cloud providers offer every GPU model. H100 models have had availability issues due to overwhelming demand. If your provider only offers one of these GPUs, your choice may be predetermined. However, you might find more competitive pricing for the A100 depending on your relationship with the provider. Gcore has both A100 and H100 in stock right now.
Summary
The H100 offers undisputable improvements over the A100 and is an impressive contender for machine learning and scientific computing workloads. The H100 is the superior choice for optimized ML workloads and tasks involving sensitive data. If optimizing your workload for the H100 isn’t feasible, using the A100 might be more cost-effective, and the A100 remains a solid choice for non-AI tasks. The H100 comes out on top for
Gcore Edge AI has both A100 and H100 GPUs available immediately in a convenient cloud service model. You only pay for what you use, so you can benefit from the speed and security of the H100 without making a long-term investment.