llm_trainer

Various models: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Qwen2-VL, Yi, Gemma, Baichuan, ChatGLM, Phi, etc.
Integrated methods: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.
Scalable resources: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.
Advanced algorithms: GaLore, BAdam, Adam-mini, DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ, PiSSA and Agent tuning.
Practical tricks: FlashAttention-2, Unsloth, Liger Kernel, RoPE scaling, NEFTune and rsLoRA.
Experiment monitors: LlamaBoard, TensorBoard, Wandb, MLflow, etc.
Faster inference: OpenAI-style API, Gradio UI and CLI with vLLM worker.

Supported Models

Model	Model size	Template
Baichuan 2	7B/13B	baichuan2
BLOOM/BLOOMZ	560M/1.1B/1.7B/3B/7.1B/176B	-
ChatGLM3	6B	chatglm3
Command R	35B/104B	cohere
DeepSeek (Code/MoE)	7B/16B/67B/236B	deepseek
Falcon	7B/11B/40B/180B	falcon
Gemma/Gemma 2/CodeGemma	2B/7B/9B/27B	gemma
GLM-4	9B	glm4
Index	1.9B	index
InternLM2/InternLM2.5	7B/20B	intern2
Llama	7B/13B/33B/65B	-
Llama 2	7B/13B/70B	llama2
Llama 3-3.2	1B/3B/8B/70B	llama3
Llama 3.2 Vision	11B/90B	mllama
LLaVA-1.5	7B/13B	llava
LLaVA-NeXT	7B/8B/13B/34B/72B/110B	llava_next
LLaVA-NeXT-Video	7B/34B	llava_next_video
MiniCPM	1B/2B/4B	cpm/cpm3
Mistral/Mixtral	7B/8x7B/8x22B	mistral
OLMo	1B/7B	-
PaliGemma	3B	paligemma
Phi-1.5/Phi-2	1.3B/2.7B	-
Phi-3	4B/14B	phi
Phi-3-small	7B	phi_small
Pixtral	12B	pixtral
Qwen/QwQ (1-2.5) (Code/Math/MoE)	0.5B/1.5B/3B/7B/14B/32B/72B/110B	qwen
Qwen2-VL	2B/7B/72B	qwen2_vl
Skywork o1	8B	skywork_o1
StarCoder 2	3B/7B/15B	-
XVERSE	7B/13B/65B	xverse
Yi/Yi-1.5 (Code)	1.5B/6B/9B/34B	yi
Yi-VL	6B/34B	yi_vl
Yuan 2	2B/51B/102B	yuan

Note

For the "base" models, the template argument can be chosen from default, alpaca, vicuna etc. But make sure to use the corresponding template for the "instruct/chat" models.

Remember to use the SAME template in training and inference.

Please refer to constants.py for a full list of models we supported.

You also can add a custom chat template to template.py.

Supported Training Approaches

Approach	Full-tuning	Freeze-tuning	LoRA	QLoRA
Pre-Training	✅	✅	✅	✅
Supervised Fine-Tuning	✅	✅	✅	✅
Reward Modeling	✅	✅	✅	✅
PPO Training	✅	✅	✅	✅
DPO Training	✅	✅	✅	✅
KTO Training	✅	✅	✅	✅
ORPO Training	✅	✅	✅	✅
SimPO Training	✅	✅	✅	✅

Tip

The implementation details of PPO can be found in this blog.

Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

pip install --upgrade huggingface_hub
huggingface-cli login

Requirement

Mandatory	Minimum	Recommend
python	3.8	3.11
torch	1.13.1	2.4.0
transformers	4.41.2	4.43.4
datasets	2.16.0	2.20.0
accelerate	0.30.1	0.32.0
peft	0.11.1	0.12.0
trl	0.8.6	0.9.6

Optional	Minimum	Recommend
CUDA	11.6	12.2
deepspeed	0.10.0	0.14.0
bitsandbytes	0.39.0	0.43.1
vllm	0.4.3	0.5.0
flash-attn	2.3.0	2.6.3

Hardware Requirement

* estimated

Method	Bits	7B	13B	30B	70B	110B	8x7B	8x22B
Full	AMP	120GB	240GB	600GB	1200GB	2000GB	900GB	2400GB
Full	16	60GB	120GB	300GB	600GB	900GB	400GB	1200GB
Freeze	16	20GB	40GB	80GB	200GB	360GB	160GB	400GB
LoRA/GaLore/BAdam	16	16GB	32GB	64GB	160GB	240GB	120GB	320GB
QLoRA	8	10GB	20GB	40GB	80GB	140GB	60GB	160GB
QLoRA	4	6GB	12GB	24GB	48GB	72GB	30GB	96GB
QLoRA	2	4GB	8GB	16GB	24GB	48GB	18GB	48GB

Getting Started

Build Docker

For CUDA users:

cd docker/docker-cuda/
docker compose up -d
docker compose exec llamafactory bash

Installation

Important

Installation is mandatory.

git clone --depth 1 http://172.16.10.175:2230/kyy/llm_trainer.git
cd llm_trainer
pip install -e ".[torch,metrics]"

Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel, bitsandbytes, hqq, eetq, gptq, awq, aqlm, vllm, galore, badam, adam-mini, qwen, modelscope, openmind, quality

Data Preparation

You can either use datasets on HuggingFace / ModelScope / Modelers hub or load the dataset in local disk.

Note

Please update data/dataset_info.json to use your custom dataset.

SFT Start

sh run_train/run_sft.sh

PT Start

sh run_train/run_pt.sh

README.md

Table of Contents

Features