TL;DR: Full fine-tuning Llama 70B requires 8x H100 GPUs (~$240K hardware or $500-2000 per training run on cloud). LoRA drops that to 2-4x A100s (~$100-400 per run). QLoRA makes it possible on a single 48GB GPU for $4-20 per run. Quality differences are measurable but often acceptable - start with QLoRA, upgrade only if needed.
---
The Question
Fine-tuning is how you make a general-purpose LLM actually useful for your specific domain. But the costs vary by 100x depending on your approach.
I keep seeing vague advice: "full fine-tuning is expensive, use LoRA instead." How expensive? How much cheaper is LoRA? At what point does the quality trade-off matter?
I ran the numbers.
Methodology & Data Sources
Hardware Requirements:
- Hugging Face Forums - VRAM calculations
- Hugging Face FSDP Blog - Memory optimization techniques
- Meta Llama Fine-Tuning Guide - Official recommendations
Cloud Pricing (December 2025):
- Lambda Labs: H100 SXM at $2.49-2.99/hr
- RunPod: H100 PCIe at $1.99/hr, A100 at $1.19/hr
- MonsterAPI: Managed fine-tuning pricing
What I didn't do:
- Run these fine-tuning jobs myself (using published benchmarks)
- Test quality differences across all domains
- Account for engineering time (significant for full fine-tuning)
The Three Approaches Explained
Full Fine-Tuning
Every parameter in the model gets updated. Maximum flexibility, maximum cost.
Memory requirement: ~1,120 GB for model states (parameters + gradients + optimizer states)
Why so much? A 70B parameter model needs:
- 70B × 2 bytes = 140 GB for FP16 weights
- 70B × 2 bytes = 140 GB for gradients
- 70B × 8 bytes = 560 GB for Adam optimizer states
- Plus activations, batch data, overhead
LoRA (Low-Rank Adaptation)
Freeze the original model, train small "adapter" layers that learn the weight updates.
Memory requirement: ~160-200 GB (model weights + small trainable adapters)
How it works: Instead of updating 70B parameters, you train ~0.1-1% of parameters in low-rank matrices that modify the frozen weights.
QLoRA (Quantized LoRA)
Same as LoRA, but the frozen model is quantized to 4-bit precision.
Memory requirement: ~46 GB (4-bit quantized weights + adapters)
The breakthrough: Reduces memory by ~20x compared to full fine-tuning, making 70B models trainable on consumer hardware.
Hardware Requirements Comparison
| Approach | VRAM Required | Minimum Hardware | Typical Setup |
|---|---|---|---|
| Full Fine-Tuning | ~1,120 GB | 8x H100 80GB | 8x H100 SXM with NVLink |
| LoRA | ~160-200 GB | 2-4x A100 80GB | 4x A100 80GB |
| QLoRA | ~46 GB | 1x 48GB GPU | 1x A100 40GB or RTX 6000 Ada |
The Cost Calculation
Scenario: Fine-tune Llama 70B on 10K examples for 3 epochs
Typical training time: 6-15 hours depending on approach and hardware.
Full Fine-Tuning Costs
Hardware Option (Buy):
- 8x H100 SXM server: ~$280,000-375,000
- Example: 8-GPU H100 servers in our catalog
- Amortized over 3 years: ~$8,000/month
Cloud Option:
- 8x H100 SXM cluster: ~$24/hour (Lambda Labs)
- Training time: 10-15 hours typical
- Cost per training run: $240-360
With failed experiments and hyperparameter tuning (5-10 runs):
- Realistic project cost: $1,200-3,600
LoRA Costs
Hardware Option (Buy):
- 4x A100 80GB workstation: ~$60,000-80,000
- Amortized over 3 years: ~$2,000/month
Cloud Option:
- 4x A100 80GB: ~$4.80/hour (RunPod)
- Training time: 8-12 hours
- Cost per training run: $38-58
With experimentation (5-10 runs):
- Realistic project cost: $190-580
QLoRA Costs
Hardware Option (Buy):
- 1x RTX 6000 Ada (48GB): ~$6,500
- 1x A100 40GB: ~$10,000-15,000
- Amortized over 3 years: ~$180-420/month
Cloud Option:
- 1x A100 40GB: ~$1.00-1.20/hour
- Training time: 10-20 hours (slower due to quantization overhead)
- Cost per training run: $10-24
Managed Services:
- MonsterAPI: ~$19-58 for Llama 70B fine-tuning
- Includes infrastructure management, no DevOps needed
With experimentation (10-20 runs, since it's cheap):
- Realistic project cost: $100-480
Cost Comparison Summary
| Approach | Per-Run Cost (Cloud) | Realistic Project Cost | Hardware Buy-In |
|---|---|---|---|
| Full Fine-Tuning | $240-360 | $1,200-3,600 | $280,000+ |
| LoRA | $38-58 | $190-580 | $60,000-80,000 |
| QLoRA | $10-24 | $100-480 | $6,500-15,000 |
The math is stark: QLoRA costs 10-30x less than full fine-tuning per run.
But What About Quality?
This is the critical question. Here's what the research shows:
Quality Hierarchy
- Full Fine-Tuning: Best possible results, full model adaptation
- LoRA: 95-99% of full fine-tuning quality for most tasks
- QLoRA: 90-97% of full fine-tuning quality, depends on task
When Quality Differences Matter
QLoRA is usually sufficient for:
- Domain adaptation (legal, medical, technical writing)
- Style transfer and tone adjustment
- Instruction following improvements
- Most chatbot and assistant applications
Consider LoRA or full fine-tuning for:
- Tasks requiring precise numerical reasoning
- Complex multi-step reasoning chains
- When you have massive training data (100K+ examples)
- Production systems where 2-5% quality difference has business impact
The Practical Recommendation
From Index.dev's analysis:
> "Start with QLoRA for rapid experimentation (20-50 iterations to find best hyperparameters). Once satisfied, run final LoRA training for 1-2% quality boost if needed."
This is the cost-effective approach: use cheap QLoRA runs to find your optimal configuration, then optionally upgrade.
The Decision Framework
Use QLoRA if:
- Budget under $500 per project
- Experimenting with fine-tuning for the first time
- Your task is domain adaptation or style transfer
- You have limited training data (<10K examples)
- You need to iterate quickly on hyperparameters
Use LoRA if:
- QLoRA quality isn't sufficient for your use case
- Budget of $200-600 per project is acceptable
- You need slightly better numerical precision
- Production deployment where quality matters
Use Full Fine-Tuning if:
- Maximum quality is required regardless of cost
- You have 100K+ high-quality training examples
- Budget of $1,500+ per project is acceptable
- You have engineering resources for multi-GPU training
- This is a core business capability worth the investment
Hardware Recommendations
For QLoRA (Best Value)
Cloud: A100 40GB instances on RunPod ($1.00-1.20/hr)
Buy: RTX 6000 Ada 48GB (~$6,500) fits in a standard workstation
For LoRA
Cloud: 2-4x A100 80GB on Lambda Labs or RunPod
Buy: Multi-GPU workstation with 2-4x A100 or equivalent
For Full Fine-Tuning
Cloud: 8x H100 cluster on Lambda Labs (~$24/hr)
Buy: Only makes sense if you're doing this repeatedly
- H100 GPU servers - $280K+ for 8-GPU configurations
Hidden Costs Nobody Mentions
Engineering Time
- QLoRA: 1-2 days to get running with existing tools (Unsloth, PEFT)
- LoRA: 2-5 days for multi-GPU setup and optimization
- Full Fine-Tuning: 1-2 weeks for proper distributed training setup
If your engineering time costs $150/hour, a week of setup adds $6,000 to the project.
Failed Experiments
Expect 5-20 failed runs before finding optimal hyperparameters. QLoRA's low cost makes iteration cheap; full fine-tuning makes each failure expensive.
Data Preparation
Often underestimated. Cleaning, formatting, and validating training data can take longer than the actual fine-tuning.
My Recommendation
For most teams: Start with QLoRA on cloud GPUs.
- Run 10-20 quick experiments to find optimal hyperparameters (~$100-200)
- Evaluate results against your quality requirements
- If quality is insufficient, try LoRA (~$50-100 for final run)
- Only consider full fine-tuning if LoRA doesn't meet requirements
The 80/20 rule applies: QLoRA gets you 80% of the benefit at 5% of the cost. Most projects never need to go beyond it.
---
Hardware Options:
- AI Workstations - For QLoRA and LoRA fine-tuning
- H100 GPU Servers - For full fine-tuning at scale
- GPU Accelerators - Add fine-tuning capability to existing systems
---
Cost estimates based on published cloud pricing and hardware specifications as of December 2025. Actual costs vary based on training data size, number of epochs, and optimization settings. Engineering time estimates assume experienced ML practitioners.
Sources: