Memory Requirements for Finetuning Large Language Models

Memory Requirements for Finetuning Large Language Models

💾Memory Requirements for Finetuning Large Language Models

Finetuning a 70B parameter model comes with serious memory demands: 1️⃣ Base weights: 140 GB (BF16) 2️⃣ Gradients: +140 GB 3️⃣ Optimizer states (Adam): +280 GB 4️⃣ Activations: Variable, can easily add 100s of GB

🧮Conclusion: Full Fine-Tuning requires a ton of GPUs e.g. H100s.

Solution: ✅ LoRA: drastically reduces memory by freezing base weights and training small low-rank adapters. ✅ QLoRA: Combine LoRA with quantization to shrink weight memory, making it possible to finetune on a single H100.

Resources and Further Reading


This article accompanies the LinkedIn post about memory requirements for finetuning large language models.