AI VRAM Calculator

Calculate exact GPU memory requirements for running, fine-tuning, or training large language models. Pick your model and workload — get a VRAM budget and matching hardware from our catalog of 653 systems.

Configure your workload

What are you doing with it?
Precision
Usage pattern

Estimate

Minimum VRAM
19 GB
Recommended VRAM (with 20% headroom)
23 GB
Recommended GPUs
RTX 4090RTX 5090RTX A5000
VRAM breakdown
Model weights
16 GB
KV cache
1 GB
Activations / buffers
1.6 GB

Hardware that can run this

8 systems from our catalog with matching GPUs, sorted by fit and price.

ASUS TUF Gaming NVIDIA GeForce RTX 4090 OC Edition Gaming Graphics Card (24GB GDDR6X, PCIe 4.0, HDMI 2.1a, DisplayPort 1.4a, Dual Ball Bearing Axial Fans)

ASUS TUF Gaming NVIDIA GeForce RTX 4090 OC Edition Gaming Graphics Card (24GB GDDR6X, PCIe 4.0, HDMI 2.1a, DisplayPort 1.4a, Dual Ball Bearing Axial Fans)

Amazon • Accelerators
Computer VisionLlm InferenceGenerative Ai
RAM: 24 GB
$1,150
PNY NVIDIA RTX A5000 Professional Graphic Card 24GB GDDR6 PCI Express 4.0 x16, Dual Slot, 3X DisplayPort, 8K Support, Ultra-Quiet Active Fan

PNY NVIDIA RTX A5000 Professional Graphic Card 24GB GDDR6 PCI Express 4.0 x16, Dual Slot, 3X DisplayPort, 8K Support, Ultra-Quiet Active Fan

Amazon • Accelerators
Llm InferenceData Analytics
$2,199
PNY NVIDIA Quadro RTX A5000 24GB GDDR6 Graphics Card (One Pack)

PNY NVIDIA Quadro RTX A5000 24GB GDDR6 Graphics Card (One Pack)

Amazon • Accelerators
Llm InferenceData Analytics
$2,199
VSXR5 540CU

VSXR5 540CU

Thinkmate • Hardware
GPU: Integrated Video (Included with Motherboard) | NVIDIA® GeForce® RTX 5060 Ti 16GB GDDR7 (2-Slot) (3xDP, 1xHDMI) | NVIDIA® GeForce® RTX 5070 12GB GDDR7 (2.4-Slot) (3xDP, 1xHDMI) | NVIDIA® GeForce® RTX 5070 Ti 16GB GDDR7 (2.98-Slot) (3xDP, 1xHDMI) | NVIDIA® GeForce® RTX 5080 16GB GDDR7 (3-Slot) (3xDP, 1xHDMI) | NVIDIA® GeForce® RTX 5090 32GB GDDR7 (3.5-Slot) (3xDP, 1xHDMI) | NVIDIA® RTX A400 - 4GB GD…
CPU: Intel® Core™ Ultra 5 Processor 225 - 10 Cores (6P+4E) 10 Threads - 4.9 GHz Max - 121W Max - Graphics | Intel® Core™ Ultra 5 Processor 225F - 10 Cores (6P+4E) 10 Threads - 4.9 GHz Max - 121W Max | Intel® Core™ Ultra 5 Processor 245KF - 14 Cores (6P+8E) 14 Threads - 5.2 GHz Max - 159W Max | Intel® Core™ Ultra 7 Processor 265F - 20 Cores (8P+12E) 20 Threads - 5.3 GHz Max - 182W Max | Intel® Core™ Ult…
RAM: 16GB PC5-44800 5600MHz DDR5 Non-ECC UDIMM (Up To 4) | 32GB PC5-44800 5600MHz DDR5 Non-ECC UDIMM (Up To 4)
$2,398
Lenovo Nvidia RTX A5000 24GB GDDR6 Graphics Card, W126823388 (Graphics Card)

Lenovo Nvidia RTX A5000 24GB GDDR6 Graphics Card, W126823388 (Graphics Card)

Amazon • Accelerators
Llm InferenceData Analytics
$2,500
NVIDIA RTX A5000 Enterprise 24GB 105MH/s 230W

NVIDIA RTX A5000 Enterprise 24GB 105MH/s 230W

Viperatech • Hardware
$2,556
VSXR5 340R7

VSXR5 340R7

Thinkmate • Hardware
GPU: Integrated Video (Included with Motherboard) | NVIDIA® GeForce® RTX 5060 Ti 16GB GDDR7 (2-Slot) (3xDP, 1xHDMI) | NVIDIA® GeForce® RTX 5070 12GB GDDR7 (2.4-Slot) (3xDP, 1xHDMI) | NVIDIA® GeForce® RTX 5070 Ti 16GB GDDR7 (2.98-Slot) (3xDP, 1xHDMI) | NVIDIA® GeForce® RTX 5080 16GB GDDR7 (3-Slot) (3xDP, 1xHDMI) | NVIDIA® GeForce® RTX 5090 32GB GDDR7 (3.5-Slot) (3xDP, 1xHDMI) | NVIDIA® RTX A400 - 4GB GD…
CPU: AMD Ryzen™ 5 9600X Processor 6-core 3.90GHz 32MB L3 Cache (65W) | AMD Ryzen™ 7 9700X Processor 8-core 3.80GHz 32MB L3 Cache (65W) | AMD Ryzen™ 9 9900X Processor 12-core 4.40GHz 64MB L3 Cache (120W) | AMD Ryzen™ 9 9950X Processor 16-core 4.30GHz 64MB L3 Cache (170W) | AMD Ryzen™ 7 9800X3D Processor 8-core 4.70GHz 96MB L3 Cache (120W) | AMD Ryzen™ 9 9900X3D Processor 12-core 4.40GHz 128MB L3 Cache (…
RAM: 16GB PC5-44800 5600MHz DDR5 Non-ECC UDIMM (Up To 4) | 32GB PC5-44800 5600MHz DDR5 Non-ECC UDIMM (Up To 4) | 16GB PC5-41600 5600MHz DDR5 ECC UDIMM (Up To 4) | 32GB PC5-41600 5600MHz DDR5 ECC UDIMM (Up To 4)
$2,584
MSI GeForce RTX 4090 Gaming X Slim 24G

MSI GeForce RTX 4090 Gaming X Slim 24G

Viperatech • Hardware
$2,600

How this is calculated

Bytes per parameter

  • FP16 / BF16: 2 bytes — training + high-quality inference
  • INT8: 1 byte — ~2× memory savings, minor quality loss
  • INT4: 0.5 bytes — ~4× memory savings, used by GGUF / GPTQ / AWQ

Workload overhead

  • Inference: weights + KV cache + ~10% runtime buffers
  • LoRA fine-tune: frozen weights + adapter states + activation memory (QLoRA if INT4)
  • Full training (Adam): weights + gradients + 2× optimizer states + activations, typically ~16–20 GB per 1B params in FP16

Usage pattern overhead

Longer contexts and more concurrent requests grow the KV cache linearly. Serving 16 production requests at 8k tokens can add tens of GB on top of the model weights. Flash-attention-2 and PagedAttention (vLLM) improve efficiency but don't eliminate the cost.

Caveats

Estimates are approximate. Real requirements depend on model architecture (attention heads, hidden dim, MQA/GQA), kernel choice, and framework overhead. Budget 10–20% headroom beyond the minimum.