Table of Contents
Supported Models
| Model | Model size | Template |
|---|---|---|
| Baichuan 2 | 7B/13B | baichuan2 |
| BLOOM/BLOOMZ | 560M/1.1B/1.7B/3B/7.1B/176B | - |
| ChatGLM3 | 6B | chatglm3 |
| Command R | 35B/104B | cohere |
| DeepSeek (Code/MoE) | 7B/16B/67B/236B | deepseek |
| Falcon | 7B/11B/40B/180B | falcon |
| Gemma/Gemma 2/CodeGemma | 2B/7B/9B/27B | gemma |
| GLM-4 | 9B | glm4 |
| Index | 1.9B | index |
| InternLM2/InternLM2.5 | 7B/20B | intern2 |
| Llama | 7B/13B/33B/65B | - |
| Llama 2 | 7B/13B/70B | llama2 |
| Llama 3-3.2 | 1B/3B/8B/70B | llama3 |
| Llama 3.2 Vision | 11B/90B | mllama |
| LLaVA-1.5 | 7B/13B | llava |
| LLaVA-NeXT | 7B/8B/13B/34B/72B/110B | llava_next |
| LLaVA-NeXT-Video | 7B/34B | llava_next_video |
| MiniCPM | 1B/2B/4B | cpm/cpm3 |
| Mistral/Mixtral | 7B/8x7B/8x22B | mistral |
| OLMo | 1B/7B | - |
| PaliGemma | 3B | paligemma |
| Phi-1.5/Phi-2 | 1.3B/2.7B | - |
| Phi-3 | 4B/14B | phi |
| Phi-3-small | 7B | phi_small |
| Pixtral | 12B | pixtral |
| Qwen/QwQ (1-2.5) (Code/Math/MoE) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
| Qwen2-VL | 2B/7B/72B | qwen2_vl |
| Skywork o1 | 8B | skywork_o1 |
| StarCoder 2 | 3B/7B/15B | - |
| XVERSE | 7B/13B/65B | xverse |
| Yi/Yi-1.5 (Code) | 1.5B/6B/9B/34B | yi |
| Yi-VL | 6B/34B | yi_vl |
| Yuan 2 | 2B/51B/102B | yuan |
Supported Training Approaches
| Approach | Full-tuning | Freeze-tuning | LoRA | QLoRA |
|---|---|---|---|---|
| Pre-Training | ✅ | ✅ | ✅ | ✅ |
| Supervised Fine-Tuning | ✅ | ✅ | ✅ | ✅ |
| Reward Modeling | ✅ | ✅ | ✅ | ✅ |
| PPO Training | ✅ | ✅ | ✅ | ✅ |
| DPO Training | ✅ | ✅ | ✅ | ✅ |
| KTO Training | ✅ | ✅ | ✅ | ✅ |
| ORPO Training | ✅ | ✅ | ✅ | ✅ |
| SimPO Training | ✅ | ✅ | ✅ | ✅ |
Tip
일부 모델델은 사용 전에 승인이 필요하므로, Hugging Face 계정으로 로그인하는 것을 추천드립니다.
Requirement
| Mandatory | Minimum | Recommend |
|---|---|---|
| python | 3.8 | 3.11 |
| torch | 1.13.1 | 2.4.0 |
| transformers | 4.41.2 | 4.43.4 |
| datasets | 2.16.0 | 2.20.0 |
| accelerate | 0.30.1 | 0.32.0 |
| peft | 0.11.1 | 0.12.0 |
| trl | 0.8.6 | 0.9.6 |
| Optional | Minimum | Recommend |
|---|---|---|
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.14.0 |
| bitsandbytes | 0.39.0 | 0.43.1 |
| vllm | 0.4.3 | 0.5.0 |
| flash-attn | 2.3.0 | 2.6.3 |
Hardware Requirement
* estimated
| Method | Bits | 7B | 13B | 30B | 70B | 110B | 8x7B | 8x22B |
|---|---|---|---|---|---|---|---|---|
| Full | AMP | 120GB | 240GB | 600GB | 1200GB | 2000GB | 900GB | 2400GB |
| Full | 16 | 60GB | 120GB | 300GB | 600GB | 900GB | 400GB | 1200GB |
| Freeze | 16 | 20GB | 40GB | 80GB | 200GB | 360GB | 160GB | 400GB |
| LoRA/GaLore/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | 240GB | 120GB | 320GB |
| QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | 140GB | 60GB | 160GB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 72GB | 30GB | 96GB |
| QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | 48GB | 18GB | 48GB |
Getting Started
Build Docker
For CUDA users:
Installation
Important
Installation is mandatory.
git clone --depth 1 http://172.16.10.175:2230/kyy/llm_trainer.git
cd llm_trainer
pip install -e ".[torch,metrics]"Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel, bitsandbytes, hqq, eetq, gptq, awq, aqlm, vllm, galore, badam, adam-mini, qwen, modelscope, openmind, quality
Data Preparation
HuggingFace, ModelScope, Modelers 허브에서 제공하는 데이터셋을 사용하거나, 로컬 디스크에서 데이터셋을 로드할 수 있습니다.
Note
Please update data/dataset_info.json to use your custom
dataset.
SFT Start
PT Start
Description
Languages
Python
99.7%
Shell
0.3%