Technical Analysis

The Real Cost of Fine-Tuning Llama 70B: Full vs LoRA vs QLoRA

January 3, 2026
7 min read
the-numberscost-analysisfine-tuningllama-70bqloraloratraininggpu-requirements

TL;DR: Full fine-tuning Llama 70B requires 8x H100 GPUs (~$240K hardware or $500-2000 per training run on cloud). LoRA drops that to 2-4x A100s (~$100-400 per run). QLoRA makes it possible on a single 48GB GPU for $4-20 per run. Quality differences are measurable but often acceptable - start with QLoRA, upgrade only if needed.

---

The Question

Fine-tuning is how you make a general-purpose LLM actually useful for your specific domain. But the costs vary by 100x depending on your approach.

I keep seeing vague advice: "full fine-tuning is expensive, use LoRA instead." How expensive? How much cheaper is LoRA? At what point does the quality trade-off matter?

I ran the numbers.

Methodology & Data Sources

Hardware Requirements:

Cloud Pricing (December 2025):

What I didn't do:

  • Run these fine-tuning jobs myself (using published benchmarks)
  • Test quality differences across all domains
  • Account for engineering time (significant for full fine-tuning)

The Three Approaches Explained

Full Fine-Tuning

Every parameter in the model gets updated. Maximum flexibility, maximum cost.

Memory requirement: ~1,120 GB for model states (parameters + gradients + optimizer states)

Why so much? A 70B parameter model needs:

  • 70B × 2 bytes = 140 GB for FP16 weights
  • 70B × 2 bytes = 140 GB for gradients
  • 70B × 8 bytes = 560 GB for Adam optimizer states
  • Plus activations, batch data, overhead

LoRA (Low-Rank Adaptation)

Freeze the original model, train small "adapter" layers that learn the weight updates.

Memory requirement: ~160-200 GB (model weights + small trainable adapters)

How it works: Instead of updating 70B parameters, you train ~0.1-1% of parameters in low-rank matrices that modify the frozen weights.

QLoRA (Quantized LoRA)

Same as LoRA, but the frozen model is quantized to 4-bit precision.

Memory requirement: ~46 GB (4-bit quantized weights + adapters)

The breakthrough: Reduces memory by ~20x compared to full fine-tuning, making 70B models trainable on consumer hardware.

Hardware Requirements Comparison

ApproachVRAM RequiredMinimum HardwareTypical Setup
Full Fine-Tuning~1,120 GB8x H100 80GB8x H100 SXM with NVLink
LoRA~160-200 GB2-4x A100 80GB4x A100 80GB
QLoRA~46 GB1x 48GB GPU1x A100 40GB or RTX 6000 Ada

The Cost Calculation

Scenario: Fine-tune Llama 70B on 10K examples for 3 epochs

Typical training time: 6-15 hours depending on approach and hardware.

Full Fine-Tuning Costs

Hardware Option (Buy):

Cloud Option:

  • 8x H100 SXM cluster: ~$24/hour (Lambda Labs)
  • Training time: 10-15 hours typical
  • Cost per training run: $240-360

With failed experiments and hyperparameter tuning (5-10 runs):

  • Realistic project cost: $1,200-3,600

LoRA Costs

Hardware Option (Buy):

  • 4x A100 80GB workstation: ~$60,000-80,000
  • Amortized over 3 years: ~$2,000/month

Cloud Option:

  • 4x A100 80GB: ~$4.80/hour (RunPod)
  • Training time: 8-12 hours
  • Cost per training run: $38-58

With experimentation (5-10 runs):

  • Realistic project cost: $190-580

QLoRA Costs

Hardware Option (Buy):

  • 1x RTX 6000 Ada (48GB): ~$6,500
  • 1x A100 40GB: ~$10,000-15,000
  • Amortized over 3 years: ~$180-420/month

Cloud Option:

  • 1x A100 40GB: ~$1.00-1.20/hour
  • Training time: 10-20 hours (slower due to quantization overhead)
  • Cost per training run: $10-24

Managed Services:

  • MonsterAPI: ~$19-58 for Llama 70B fine-tuning
  • Includes infrastructure management, no DevOps needed

With experimentation (10-20 runs, since it's cheap):

  • Realistic project cost: $100-480

Cost Comparison Summary

ApproachPer-Run Cost (Cloud)Realistic Project CostHardware Buy-In
Full Fine-Tuning$240-360$1,200-3,600$280,000+
LoRA$38-58$190-580$60,000-80,000
QLoRA$10-24$100-480$6,500-15,000

The math is stark: QLoRA costs 10-30x less than full fine-tuning per run.

But What About Quality?

This is the critical question. Here's what the research shows:

Quality Hierarchy

  1. Full Fine-Tuning: Best possible results, full model adaptation
  2. LoRA: 95-99% of full fine-tuning quality for most tasks
  3. QLoRA: 90-97% of full fine-tuning quality, depends on task

When Quality Differences Matter

QLoRA is usually sufficient for:

  • Domain adaptation (legal, medical, technical writing)
  • Style transfer and tone adjustment
  • Instruction following improvements
  • Most chatbot and assistant applications

Consider LoRA or full fine-tuning for:

  • Tasks requiring precise numerical reasoning
  • Complex multi-step reasoning chains
  • When you have massive training data (100K+ examples)
  • Production systems where 2-5% quality difference has business impact

The Practical Recommendation

From Index.dev's analysis:

> "Start with QLoRA for rapid experimentation (20-50 iterations to find best hyperparameters). Once satisfied, run final LoRA training for 1-2% quality boost if needed."

This is the cost-effective approach: use cheap QLoRA runs to find your optimal configuration, then optionally upgrade.

The Decision Framework

Use QLoRA if:

  • Budget under $500 per project
  • Experimenting with fine-tuning for the first time
  • Your task is domain adaptation or style transfer
  • You have limited training data (<10K examples)
  • You need to iterate quickly on hyperparameters

Use LoRA if:

  • QLoRA quality isn't sufficient for your use case
  • Budget of $200-600 per project is acceptable
  • You need slightly better numerical precision
  • Production deployment where quality matters

Use Full Fine-Tuning if:

  • Maximum quality is required regardless of cost
  • You have 100K+ high-quality training examples
  • Budget of $1,500+ per project is acceptable
  • You have engineering resources for multi-GPU training
  • This is a core business capability worth the investment

Hardware Recommendations

For QLoRA (Best Value)

Cloud: A100 40GB instances on RunPod ($1.00-1.20/hr)

Buy: RTX 6000 Ada 48GB (~$6,500) fits in a standard workstation

For LoRA

Cloud: 2-4x A100 80GB on Lambda Labs or RunPod

Buy: Multi-GPU workstation with 2-4x A100 or equivalent

For Full Fine-Tuning

Cloud: 8x H100 cluster on Lambda Labs (~$24/hr)

Buy: Only makes sense if you're doing this repeatedly

Hidden Costs Nobody Mentions

Engineering Time

  • QLoRA: 1-2 days to get running with existing tools (Unsloth, PEFT)
  • LoRA: 2-5 days for multi-GPU setup and optimization
  • Full Fine-Tuning: 1-2 weeks for proper distributed training setup

If your engineering time costs $150/hour, a week of setup adds $6,000 to the project.

Failed Experiments

Expect 5-20 failed runs before finding optimal hyperparameters. QLoRA's low cost makes iteration cheap; full fine-tuning makes each failure expensive.

Data Preparation

Often underestimated. Cleaning, formatting, and validating training data can take longer than the actual fine-tuning.

My Recommendation

For most teams: Start with QLoRA on cloud GPUs.

  1. Run 10-20 quick experiments to find optimal hyperparameters (~$100-200)
  2. Evaluate results against your quality requirements
  3. If quality is insufficient, try LoRA (~$50-100 for final run)
  4. Only consider full fine-tuning if LoRA doesn't meet requirements

The 80/20 rule applies: QLoRA gets you 80% of the benefit at 5% of the cost. Most projects never need to go beyond it.

---

Hardware Options:

---

Cost estimates based on published cloud pricing and hardware specifications as of December 2025. Actual costs vary based on training data size, number of epochs, and optimization settings. Engineering time estimates assume experienced ML practitioners.

Sources:

Share this post