Overview
The NVIDIA H100 NVL is a Hopper-architecture PCIe AI accelerator with 94GB HBM3 memory via a full 6144-bit memory bus, delivering up to 12× faster inference than A100 for GPT-3-scale models. Passively cooled at 350W via PCIe, it provides enterprise-grade Transformer Engine acceleration for large language model deployment without requiring SXM infrastructure.
Key Features
- 94GB HBM3 via full 6144-bit memory interface
- Hopper architecture with Transformer Engine for LLM acceleration
- Up to 12× faster GPT-3 inference vs prior-gen A100
- Passive cooling at 350W via PCIe — no SXM fabric needed
- 132 Streaming Multiprocessors for massive parallel compute
Ideal For
Feature
The H100 NVL has a full 6144-bit memory interface (1024-bit for each HBM3 stack) and memory speed up to 5.1 Gbps. This means that the maximum throughput is 7.8GB/s, more than twice as much as the H100 SXM. Large Language Models require large buffers and higher bandwidth will certainly have an impact as well.
Feature
NVIDIA H100 NVL for Large Language Model Deployment is ideal for deploying massive LLMs like ChatGPT at scale. The new H100 NVL with 96GB of memory with Transformer Engine acceleration delivers up to 12x faster inference performance at GPT-3 compared to the prior generation A100 at data center scale.
GPU Memory
94 GB HBM3
Architecture
NVIDIA Hopper (GH100)
TDP
350W passive
Memory Bus
6144-bit
Interface
PCIe 5.0 x16
Prices may vary. Verify on vendor site.
Quick Specs
- Feature
- NVIDIA H100 NVL for Large Language Model Deployment is ideal for deploying massive LLMs like ChatGPT at scale. The new H100 NVL with 96GB of memory with Transformer Engine acceleration delivers up to 12x faster inference performance at GPT-3 compared to the prior generation A100 at data center scale.
- GPU Memory
- 94 GB HBM3
- Architecture
- NVIDIA Hopper (GH100)
- TDP
- 350W passive
- Memory Bus
- 6144-bit
- Interface
- PCIe 5.0 x16
