llm_trainer/README.md

## Table of Contents

- [Features](#features)
- [Supported Models](#supported-models)
- [Supported Training Approaches](#supported-training-approaches)
- [Provided Datasets](#provided-datasets)
- [Requirement](#requirement)
- [Getting Started](#getting-started)
- [Projects using LLaMA Factory](#projects-using-llama-factory)
- [License](#license)
- [Citation](#citation)
- [Acknowledgement](#acknowledgement)

## Features

- **Various models**: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Qwen2-VL, Yi, Gemma, Baichuan, ChatGLM, Phi, etc.
- **Integrated methods**: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.
- **Scalable resources**: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.
- **Advanced algorithms**: [GaLore](https://github.com/jiaweizzhao/GaLore), [BAdam](https://github.com/Ledzy/BAdam), [Adam-mini](https://github.com/zyushun/Adam-mini), DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ, PiSSA and Agent tuning.
- **Practical tricks**: [FlashAttention-2](https://github.com/Dao-AILab/flash-attention), [Unsloth](https://github.com/unslothai/unsloth), [Liger Kernel](https://github.com/linkedin/Liger-Kernel), RoPE scaling, NEFTune and rsLoRA.
- **Experiment monitors**: LlamaBoard, TensorBoard, Wandb, MLflow, etc.
- **Faster inference**: OpenAI-style API, Gradio UI and CLI with vLLM worker.

## Supported Models

| Model                                                           | Model size                       | Template         |
| --------------------------------------------------------------- | -------------------------------- | ---------------- |
| [Baichuan 2](https://huggingface.co/baichuan-inc)               | 7B/13B                           | baichuan2        |
| [BLOOM/BLOOMZ](https://huggingface.co/bigscience)               | 560M/1.1B/1.7B/3B/7.1B/176B      | -                |
| [ChatGLM3](https://huggingface.co/THUDM)                        | 6B                               | chatglm3         |
| [Command R](https://huggingface.co/CohereForAI)                 | 35B/104B                         | cohere           |
| [DeepSeek (Code/MoE)](https://huggingface.co/deepseek-ai)       | 7B/16B/67B/236B                  | deepseek         |
| [Falcon](https://huggingface.co/tiiuae)                         | 7B/11B/40B/180B                  | falcon           |
| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google)        | 2B/7B/9B/27B                     | gemma            |
| [GLM-4](https://huggingface.co/THUDM)                           | 9B                               | glm4             |
| [Index](https://huggingface.co/IndexTeam)                       | 1.9B                             | index            |
| [InternLM2/InternLM2.5](https://huggingface.co/internlm)        | 7B/20B                           | intern2          |
| [Llama](https://github.com/facebookresearch/llama)              | 7B/13B/33B/65B                   | -                |
| [Llama 2](https://huggingface.co/meta-llama)                    | 7B/13B/70B                       | llama2           |
| [Llama 3-3.2](https://huggingface.co/meta-llama)                | 1B/3B/8B/70B                     | llama3           |
| [Llama 3.2 Vision](https://huggingface.co/meta-llama)           | 11B/90B                          | mllama           |
| [LLaVA-1.5](https://huggingface.co/llava-hf)                    | 7B/13B                           | llava            |
| [LLaVA-NeXT](https://huggingface.co/llava-hf)                   | 7B/8B/13B/34B/72B/110B           | llava_next       |
| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf)             | 7B/34B                           | llava_next_video |
| [MiniCPM](https://huggingface.co/openbmb)                       | 1B/2B/4B                         | cpm/cpm3         |
| [Mistral/Mixtral](https://huggingface.co/mistralai)             | 7B/8x7B/8x22B                    | mistral          |
| [OLMo](https://huggingface.co/allenai)                          | 1B/7B                            | -                |
| [PaliGemma](https://huggingface.co/google)                      | 3B                               | paligemma        |
| [Phi-1.5/Phi-2](https://huggingface.co/microsoft)               | 1.3B/2.7B                        | -                |
| [Phi-3](https://huggingface.co/microsoft)                       | 4B/14B                           | phi              |
| [Phi-3-small](https://huggingface.co/microsoft)                 | 7B                               | phi_small        |
| [Pixtral](https://huggingface.co/mistralai)                     | 12B                              | pixtral          |
| [Qwen/QwQ (1-2.5) (Code/Math/MoE)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen             |
| [Qwen2-VL](https://huggingface.co/Qwen)                         | 2B/7B/72B                        | qwen2_vl         |
| [Skywork o1](https://huggingface.co/Skywork)                    | 8B                               | skywork_o1       |
| [StarCoder 2](https://huggingface.co/bigcode)                   | 3B/7B/15B                        | -                |
| [XVERSE](https://huggingface.co/xverse)                         | 7B/13B/65B                       | xverse           |
| [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai)                | 1.5B/6B/9B/34B                   | yi               |
| [Yi-VL](https://huggingface.co/01-ai)                           | 6B/34B                           | yi_vl            |
| [Yuan 2](https://huggingface.co/IEITYuan)                       | 2B/51B/102B                      | yuan             |

> [!NOTE]
> For the "base" models, the `template` argument can be chosen from `default`, `alpaca`, `vicuna` etc. But make sure to use the **corresponding template** for the "instruct/chat" models.
>
> Remember to use the **SAME** template in training and inference.

Please refer to [constants.py](src/llamafactory/extras/constants.py) for a full list of models we supported.

You also can add a custom chat template to [template.py](src/llamafactory/data/template.py).

## Supported Training Approaches

| Approach               | Full-tuning        | Freeze-tuning      | LoRA               | QLoRA              |
| ---------------------- | ------------------ | ------------------ | ------------------ | ------------------ |
| Pre-Training           | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| Supervised Fine-Tuning | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| Reward Modeling        | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| PPO Training           | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| DPO Training           | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| KTO Training           | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| ORPO Training          | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |
| SimPO Training         | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: |

> [!TIP]
> The implementation details of PPO can be found in [this blog](https://newfacade.github.io/notes-on-reinforcement-learning/17-ppo-trl.html).

Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.

```bash
pip install --upgrade huggingface_hub
huggingface-cli login
```

## Requirement

| Mandatory    | Minimum | Recommend |
| ------------ | ------- | --------- |
| python       | 3.8     | 3.11      |
| torch        | 1.13.1  | 2.4.0     |
| transformers | 4.41.2  | 4.43.4    |
| datasets     | 2.16.0  | 2.20.0    |
| accelerate   | 0.30.1  | 0.32.0    |
| peft         | 0.11.1  | 0.12.0    |
| trl          | 0.8.6   | 0.9.6     |

| Optional     | Minimum | Recommend |
| ------------ | ------- | --------- |
| CUDA         | 11.6    | 12.2      |
| deepspeed    | 0.10.0  | 0.14.0    |
| bitsandbytes | 0.39.0  | 0.43.1    |
| vllm         | 0.4.3   | 0.5.0     |
| flash-attn   | 2.3.0   | 2.6.3     |

### Hardware Requirement

\* _estimated_

| Method            | Bits | 7B    | 13B   | 30B   | 70B    | 110B   | 8x7B  | 8x22B  |
| ----------------- | ---- | ----- | ----- | ----- | ------ | ------ | ----- | ------ |
| Full              | AMP  | 120GB | 240GB | 600GB | 1200GB | 2000GB | 900GB | 2400GB |
| Full              | 16   | 60GB  | 120GB | 300GB | 600GB  | 900GB  | 400GB | 1200GB |
| Freeze            | 16   | 20GB  | 40GB  | 80GB  | 200GB  | 360GB  | 160GB | 400GB  |
| LoRA/GaLore/BAdam | 16   | 16GB  | 32GB  | 64GB  | 160GB  | 240GB  | 120GB | 320GB  |
| QLoRA             | 8    | 10GB  | 20GB  | 40GB  | 80GB   | 140GB  | 60GB  | 160GB  |
| QLoRA             | 4    | 6GB   | 12GB  | 24GB  | 48GB   | 72GB   | 30GB  | 96GB   |
| QLoRA             | 2    | 4GB   | 8GB   | 16GB  | 24GB   | 48GB   | 18GB  | 48GB   |

## Getting Started

### Build Docker

For CUDA users:

```bash
cd docker/docker-cuda/
docker compose up -d
docker compose exec llamafactory bash
```

### Installation

> [!IMPORTANT]
> Installation is mandatory.

```bash
git clone --depth 1 http://172.16.10.175:2230/kyy/llm_trainer.git
cd llm_trainer
pip install -e ".[torch,metrics]"
```

Extra dependencies available: torch, torch-npu, metrics, deepspeed, liger-kernel, bitsandbytes, hqq, eetq, gptq, awq, aqlm, vllm, galore, badam, adam-mini, qwen, modelscope, openmind, quality

### Data Preparation

You can either use datasets on HuggingFace / ModelScope / Modelers hub or load the dataset in local disk.

> [!NOTE]
> Please update `data/dataset_info.json` to use your custom dataset.

### SFT Start

```bash
sh run_train/run_sft.sh
```

### PT Start

```bash
sh run_train/run_pt.sh
```