LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning method that inserts small trainable rank-decomposition matrices into model layers.
Definition
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that dramatically reduces the number of trainable parameters during adaptation. Instead of updating all model weights, LoRA freezes the original weights and inserts pairs of small rank-decomposition matrices (A and B, where AB approximates the weight update ΔW) into each target layer.
For a model with billions of parameters, LoRA might train only millions of parameters — a 1000× reduction. This makes fine-tuning feasible on consumer GPUs and enables rapid iteration. LoRA weights can be merged back into the base model for inference with zero latency overhead, or kept separate for swapping.
QLoRA combines 4-bit quantisation of the base model with LoRA adapters, enabling fine-tuning of 70B parameter models on a single 48GB GPU. LoRA has become the dominant fine-tuning method for open-source LLMs.
Examples
- Hugging Face PEFT library
- QLoRA (Guanaco model)
- LoRA for Stable Diffusion (character styles)
- OpenAI fine-tuning (uses similar concepts)
Related Terms
Fine-tuning
Continuing training of a pre-trained model on domain-specific data to specialise it for a particular task.
Quantisation
Reducing the numerical precision of model weights to decrease memory footprint and accelerate inference with minimal accuracy loss.
Large Language Model (LLM)
A transformer-based AI system trained on billions of tokens of text, capable of generating, reasoning about, and transforming language.