Note that the flash-attn library is installed in this image and the qwen model will use it automatically. However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows: FlashAttention only supports Ampere GPUs or newer. So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
👋 Join our WeChat.
[ English | 中文 ]
Fine-tuning a large language model can be easy as…
https://github.com/hiyouga/LLaMA-Factory/assets/16256802/9840a653-7e9c-41c8-ae89-7ace5698baf6
Choose your path:
- 🤗 Spaces: https://huggingface.co/spaces/hiyouga/LLaMA-Board
- ModelScope: https://modelscope.cn/studios/hiyouga/LLaMA-Board
- Colab: https://colab.research.google.com/drive/1eRTPn37ltBbYsISy9Aw2NuI2Aq5CQrD9?usp=sharing
- Local machine: Please refer to usage
Table of Contents
- Features
- Benchmark
- Changelog
- Supported Models
- Supported Training Approaches
- Provided Datasets
- Requirement
- Getting Started
- Projects using LLaMA Factory
- License
- Citation
- Acknowledgement
Features
- Various models: LLaMA, Mistral, Mixtral-MoE, Qwen, Yi, Gemma, Baichuan, ChatGLM, Phi, etc.
- Integrated methods: (Continuous) pre-training, supervised fine-tuning, reward modeling, PPO and DPO.
- Scalable resources: 32-bit full-tuning, 16-bit freeze-tuning, 16-bit LoRA and 2/4/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8.
- Advanced algorithms: GaLore, DoRA, LongLoRA, LLaMA Pro, LoRA+, LoftQ and Agent tuning.
- Practical tricks: FlashAttention-2, Unsloth, RoPE scaling, NEFTune and rsLoRA.
- Experiment monitors: LlamaBoard, TensorBoard, Wandb, MLflow, etc.
- Faster inference: OpenAI-style API, Gradio UI and CLI with vLLM worker.
Benchmark
Compared to ChatGLM’s P-Tuning, LLaMA-Factory’s LoRA tuning offers up to 3.7 times faster training speed with a better Rouge score on the advertising text generation task. By leveraging 4-bit quantization technique, LLaMA-Factory’s QLoRA further improves the efficiency regarding the GPU memory.
Definitions
- Training Speed: the number of training samples processed per second during the training. (bs=4, cutoff_len=1024)
- Rouge Score: Rouge-2 score on the development set of the advertising text generation task. (bs=4, cutoff_len=1024)
- GPU Memory: Peak GPU memory usage in 4-bit quantized training. (bs=1, cutoff_len=1024)
- We adopt
pre_seq_len=128for ChatGLM’s P-Tuning andlora_rank=32for LLaMA-Factory’s LoRA tuning.
Changelog
[24/03/13] We supported LoRA+. Try
loraplus_lr_ratio=16.0 to enable LoRA+ algorithm.
[24/03/07] We supported gradient low-rank projection (GaLore) algorithm.
Try --use_galore to use the memory-efficient optimizer.
[24/03/07] We integrated vLLM for faster
and concurrent inference. Try --infer_backend vllm to enjoy
270% inference speed. (LoRA is not yet supported, merge
it first.)
[24/02/28] We supported weight-decomposed LoRA (DoRA). Try
--use_dora to activate DoRA training.
[24/02/15] We supported block expansion proposed by
LLaMA Pro. See
scripts/llama_pro.py for usage.
Full Changelog
[24/02/05] Qwen1.5 (Qwen2 beta version) series models are supported in LLaMA-Factory. Check this blog post for details.
[24/01/18] We supported agent tuning for most
models, equipping model with tool using abilities by fine-tuning with
--dataset glaive_toolcall.
[23/12/23] We supported unsloth’s
implementation to boost LoRA tuning for the LLaMA, Mistral and Yi
models. Try --use_unsloth argument to activate unsloth
patch. It achieves 170% speed in our benchmark, check
this
page for details.
[23/12/12] We supported fine-tuning the latest MoE model Mixtral 8x7B in our framework. See hardware requirement here.
[23/12/01] We supported downloading pre-trained models and datasets from the ModelScope Hub for Chinese mainland users. See this tutorial for usage.
[23/10/21] We supported NEFTune trick for
fine-tuning. Try --neftune_noise_alpha argument to activate
NEFTune, e.g., --neftune_noise_alpha 5.
[23/09/27] We supported S^2-Attn proposed by LongLoRA for the
LLaMA models. Try --shift_attn argument to enable shift
short attention.
[23/09/23] We integrated MMLU, C-Eval and CMMLU benchmarks in this repo. See this example to evaluate your models.
[23/09/10] We supported FlashAttention-2.
Try --flash_attn argument to enable FlashAttention-2 if you
are using RTX4090, A100 or H100 GPUs.
[23/08/12] We supported RoPE scaling to extend the
context length of the LLaMA models. Try
--rope_scaling linear argument in training and
--rope_scaling dynamic argument at inference to extrapolate
the position embeddings.
[23/08/11] We supported DPO training for instruction-tuned models. See this example to train your models.
[23/07/31] We supported dataset streaming. Try
--streaming and --max_steps 10000 arguments to
load your dataset in streaming mode.
[23/07/29] We released two instruction-tuned 13B models at Hugging Face. See these Hugging Face Repos (LLaMA-2 / Baichuan) for details.
[23/07/18] We developed an all-in-one Web UI for
training, evaluation and inference. Try train_web.py to
fine-tune models in your Web browser. Thank @KanadeSiina and @codemayq for their efforts in
the development.
[23/07/09] We released FastEdit ⚡🩹, an easy-to-use package for editing the factual knowledge of large language models efficiently. Please follow FastEdit if you are interested.
[23/06/29] We provided a reproducible example of training a chat model using instruction-following datasets, see Baichuan-7B-sft for details.
[23/06/22] We aligned the demo API with the OpenAI’s format where you can insert the fine-tuned model in arbitrary ChatGPT-based applications.
[23/06/03] We supported quantized training and inference (aka
QLoRA).
Try --quantization_bit 4/8 argument to work with quantized
models.
Supported Models
| Model | Model size | Default module | Template |
|---|---|---|---|
| Baichuan2 | 7B/13B | W_pack | baichuan2 |
| BLOOM | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - |
| BLOOMZ | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - |
| ChatGLM3 | 6B | query_key_value | chatglm3 |
| DeepSeek (MoE) | 7B/16B/67B | q_proj,v_proj | deepseek |
| Falcon | 7B/40B/180B | query_key_value | falcon |
| Gemma | 2B/7B | q_proj,v_proj | gemma |
| InternLM2 | 7B/20B | wqkv | intern2 |
| LLaMA | 7B/13B/33B/65B | q_proj,v_proj | - |
| LLaMA-2 | 7B/13B/70B | q_proj,v_proj | llama2 |
| Mistral | 7B | q_proj,v_proj | mistral |
| Mixtral | 8x7B | q_proj,v_proj | mistral |
| OLMo | 1B/7B | att_proj | olmo |
| Phi-1.5/2 | 1.3B/2.7B | q_proj,v_proj | - |
| Qwen | 1.8B/7B/14B/72B | c_attn | qwen |
| Qwen1.5 | 0.5B/1.8B/4B/7B/14B/72B | q_proj,v_proj | qwen |
| StarCoder2 | 3B/7B/15B | q_proj,v_proj | - |
| XVERSE | 7B/13B/65B | q_proj,v_proj | xverse |
| Yi | 6B/9B/34B | q_proj,v_proj | yi |
| Yuan | 2B/51B/102B | q_proj,v_proj | yuan |
Note
Default module is used for the
--lora_target argument, you can use
--lora_target all to specify all the available modules.
For the “base” models, the --template argument can be
chosen from default, alpaca,
vicuna etc. But make sure to use the corresponding
template for the “chat” models.
Please refer to constants.py for a full list of models we supported.
You also can add a custom chat template to template.py.
Supported Training Approaches
| Approach | Full-tuning | Freeze-tuning | LoRA | QLoRA |
|---|---|---|---|---|
| Pre-Training | ✅ | ✅ | ✅ | ✅ |
| Supervised Fine-Tuning | ✅ | ✅ | ✅ | ✅ |
| Reward Modeling | ✅ | ✅ | ✅ | ✅ |
| PPO Training | ✅ | ✅ | ✅ | ✅ |
| DPO Training | ✅ | ✅ | ✅ | ✅ |
Note
Use --quantization_bit 4 argument to enable QLoRA.
Provided Datasets
Pre-training datasets
Supervised fine-tuning datasets
- Stanford Alpaca (en)
- Stanford Alpaca (zh)
- Alpaca GPT4 (en&zh)
- Self Cognition (zh)
- Open Assistant (multilingual)
- ShareGPT (zh)
- Guanaco Dataset (multilingual)
- BELLE 2M (zh)
- BELLE 1M (zh)
- BELLE 0.5M (zh)
- BELLE Dialogue 0.4M (zh)
- BELLE School Math 0.25M (zh)
- BELLE Multiturn Chat 0.8M (zh)
- UltraChat (en)
- LIMA (en)
- OpenPlatypus (en)
- CodeAlpaca 20k (en)
- Alpaca CoT (multilingual)
- OpenOrca (en)
- SlimOrca (en)
- MathInstruct (en)
- Firefly 1.1M (zh)
- Wiki QA (en)
- Web QA (zh)
- WebNovel (zh)
- Nectar (en)
- deepctrl (en&zh)
- Ad Gen (zh)
- ShareGPT Hyperfiltered (en)
- ShareGPT4 (en&zh)
- UltraChat 200k (en)
- AgentInstruct (en)
- LMSYS Chat 1M (en)
- Evol Instruct V2 (en)
- Glaive Function Calling V2 (en)
- Cosmopedia (en)
- Open Assistant (de)
- Dolly 15k (de)
- Alpaca GPT4 (de)
- OpenSchnabeltier (de)
- Evol Instruct (de)
- Dolphin (de)
- Booksum (de)
- Airoboros (de)
- Ultrachat (de)
Preference datasets
Please refer to data/README.md for details.
Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.
Requirement
| Mandatory | Minimum | Recommend |
|---|---|---|
| python | 3.8 | 3.10 |
| torch | 1.13.1 | 2.2.0 |
| transformers | 4.37.2 | 4.38.2 |
| datasets | 2.14.3 | 2.17.1 |
| accelerate | 0.27.2 | 0.27.2 |
| peft | 0.9.0 | 0.9.0 |
| trl | 0.7.11 | 0.7.11 |
| Optional | Minimum | Recommend |
|---|---|---|
| CUDA | 11.6 | 12.2 |
| deepspeed | 0.10.0 | 0.13.1 |
| bitsandbytes | 0.39.0 | 0.41.3 |
| flash-attn | 2.3.0 | 2.5.5 |
Hardware Requirement
* estimated
| Method | Bits | 7B | 13B | 30B | 70B | 8x7B |
|---|---|---|---|---|---|---|
| Full | AMP | 120GB | 240GB | 600GB | 1200GB | 900GB |
| Full | 16 | 60GB | 120GB | 300GB | 600GB | 400GB |
| GaLore | 16 | 16GB | 32GB | 64GB | 160GB | 120GB |
| Freeze | 16 | 20GB | 40GB | 80GB | 200GB | 160GB |
| LoRA | 16 | 16GB | 32GB | 64GB | 160GB | 120GB |
| QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | 60GB |
| QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 30GB |
| QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | 18GB |
Getting Started
Data Preparation (optional)
Please refer to data/README.md for
checking the details about the format of dataset files. You can either
use a single .json file or a dataset
loading script with multiple files to create a custom dataset.
Note
Please update data/dataset_info.json to use your custom
dataset. About the format of this file, please refer to
data/README.md.
Dependence Installation (optional)
git clone https://github.com/hiyouga/LLaMA-Factory.git
conda create -n llama_factory python=3.10
conda activate llama_factory
cd LLaMA-Factory
pip install -r requirements.txtIf you want to enable the quantized LoRA (QLoRA) on the Windows
platform, you will be required to install a pre-built version of
bitsandbytes library, which supports CUDA 11.1 to 12.2.
pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.40.0-py3-none-win_amd64.whlTo enable FlashAttention-2 on the Windows platform, you need to
install the precompiled flash-attn library, which supports
CUDA 12.1 to 12.2. Please download the corresponding version from flash-attention
based on your requirements.
Use ModelScope Hub (optional)
If you have trouble with downloading models and datasets from Hugging Face, you can use LLaMA-Factory together with ModelScope in the following manner.
Then you can train the corresponding model by specifying a model ID of the ModelScope Hub. (find a full list of model IDs at ModelScope Hub)
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--model_name_or_path modelscope/Llama-2-7b-ms \
... # arguments (same as below)LLaMA Board also supports using the models and datasets on the ModelScope Hub.
Train on a single GPU
Important
If you want to train models on multiple GPUs, please refer to Distributed Training.
LLaMA Board GUI
Pre-Training
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage pt \
--do_train \
--model_name_or_path path_to_llama_model \
--dataset wiki_demo \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--output_dir path_to_pt_checkpoint \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 1000 \
--learning_rate 5e-5 \
--num_train_epochs 3.0 \
--plot_loss \
--fp16Supervised Fine-Tuning
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage sft \
--do_train \
--model_name_or_path path_to_llama_model \
--dataset alpaca_gpt4_en \
--template default \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--output_dir path_to_sft_checkpoint \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 1000 \
--learning_rate 5e-5 \
--num_train_epochs 3.0 \
--plot_loss \
--fp16Reward Modeling
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage rm \
--do_train \
--model_name_or_path path_to_llama_model \
--adapter_name_or_path path_to_sft_checkpoint \
--create_new_adapter \
--dataset comparison_gpt4_en \
--template default \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--output_dir path_to_rm_checkpoint \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 1000 \
--learning_rate 1e-6 \
--num_train_epochs 1.0 \
--plot_loss \
--fp16PPO Training
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage ppo \
--do_train \
--model_name_or_path path_to_llama_model \
--adapter_name_or_path path_to_sft_checkpoint \
--create_new_adapter \
--dataset alpaca_gpt4_en \
--template default \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--reward_model path_to_rm_checkpoint \
--output_dir path_to_ppo_checkpoint \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--top_k 0 \
--top_p 0.9 \
--logging_steps 10 \
--save_steps 1000 \
--learning_rate 1e-5 \
--num_train_epochs 1.0 \
--plot_loss \
--fp16Tip
Use
--adapter_name_or_path path_to_sft_checkpoint,path_to_ppo_checkpoint
to infer the fine-tuned model.
Warning
Use --per_device_train_batch_size=1 for LLaMA-2 models
in fp16 PPO training.
DPO Training
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage dpo \
--do_train \
--model_name_or_path path_to_llama_model \
--adapter_name_or_path path_to_sft_checkpoint \
--create_new_adapter \
--dataset comparison_gpt4_en \
--template default \
--finetuning_type lora \
--lora_target q_proj,v_proj \
--output_dir path_to_dpo_checkpoint \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 1000 \
--learning_rate 1e-5 \
--num_train_epochs 1.0 \
--plot_loss \
--fp16Tip
Use
--adapter_name_or_path path_to_sft_checkpoint,path_to_dpo_checkpoint
to infer the fine-tuned model.
Distributed Training
Use Huggingface Accelerate
Example config.yaml for LoRA training
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: MULTI_GPU
downcast_bf16: 'no'
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: fp16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: falseTip
We commend using Accelerate for LoRA tuning.
Use DeepSpeed
deepspeed --num_gpus 8 src/train_bash.py \
--deepspeed ds_config.json \
... # arguments (same as above)Example ds_config.json for full-parameter training with DeepSpeed ZeRO-2
{
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"zero_allow_untested_optimizer": true,
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
},
"bf16": {
"enabled": "auto"
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 5e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 5e8,
"contiguous_gradients": true,
"round_robin_gradients": true
}
}Tip
Refer to examples for more training scripts.
Merge LoRA weights and export model
CUDA_VISIBLE_DEVICES=0 python src/export_model.py \
--model_name_or_path path_to_llama_model \
--adapter_name_or_path path_to_checkpoint \
--template default \
--finetuning_type lora \
--export_dir path_to_export \
--export_size 2 \
--export_legacy_format FalseWarning
Merging LoRA weights into a quantized model is not supported.
Tip
Use --model_name_or_path path_to_export solely to use
the exported model.
Use --export_quantization_bit 4 and
--export_quantization_dataset data/c4_demo.json to quantize
the model with AutoGPTQ after merging the LoRA weights.
Inference with OpenAI-style API
CUDA_VISIBLE_DEVICES=0 API_PORT=8000 python src/api_demo.py \
--model_name_or_path path_to_llama_model \
--adapter_name_or_path path_to_checkpoint \
--template default \
--finetuning_type loraTip
Visit http://localhost:8000/docs for API
documentation.
Inference with command line
CUDA_VISIBLE_DEVICES=0 python src/cli_demo.py \
--model_name_or_path path_to_llama_model \
--adapter_name_or_path path_to_checkpoint \
--template default \
--finetuning_type loraInference with web browser
CUDA_VISIBLE_DEVICES=0 python src/web_demo.py \
--model_name_or_path path_to_llama_model \
--adapter_name_or_path path_to_checkpoint \
--template default \
--finetuning_type loraEvaluation
CUDA_VISIBLE_DEVICES=0 python src/evaluate.py \
--model_name_or_path path_to_llama_model \
--adapter_name_or_path path_to_checkpoint \
--template vanilla \
--finetuning_type lora \
--task mmlu \
--split test \
--lang en \
--n_shot 5 \
--batch_size 4Predict
CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \
--stage sft \
--do_predict \
--model_name_or_path path_to_llama_model \
--adapter_name_or_path path_to_checkpoint \
--dataset alpaca_gpt4_en \
--template default \
--finetuning_type lora \
--output_dir path_to_predict_result \
--per_device_eval_batch_size 1 \
--max_samples 100 \
--predict_with_generate \
--fp16Warning
Use --per_device_train_batch_size=1 for LLaMA-2 models
in fp16 predict.
Tip
We recommend using --per_device_eval_batch_size=1 and
--max_target_length 128 at 4/8-bit predict.
Dockerize Training
Get ready
Necessary dockerized environment is needed, such as Docker or Docker Compose.
Docker support
docker build -f ./Dockerfile -t llama-factory:latest .
docker run --gpus=all -v ./hf_cache:/root/.cache/huggingface/ -v ./data:/app/data -v ./output:/app/output -p 7860:7860 --shm-size 16G --name llama_factory -d llama-factory:latestDocker Compose support
Tip
Details about volume:
- hf_cache: Utilize Huggingface cache on the host machine. Reassignable if a cache already exists in a different directory.
- data: Place datasets on this dir of the host machine so that they can be selected on LLaMA Board GUI.
- output: Set export dir to this location so that the merged result can be accessed directly on the host machine.
Projects using LLaMA Factory
- Wang et al. ESRL: Efficient Sampling-based Reinforcement Learning for Sequence Generation. 2023. [arxiv]
- Yu et al. Open, Closed, or Small Language Models for Text Classification? 2023. [arxiv]
- Luceri et al. Leveraging Large Language Models to Detect Influence Campaigns in Social Media. 2023. [arxiv]
- Zhang et al. Alleviating Hallucinations of Large Language Models through Induced Hallucinations. 2023. [arxiv]
- Wang et al. Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs. 2024. [arxiv]
- Wang et al. CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning. 2024. [arxiv]
- Choi et al. FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs. 2024. [arxiv]
- Zhang et al. AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts. 2024. [arxiv]
- Lyu et al. KnowTuning: Knowledge-aware Fine-tuning for Large Language Models. 2024. [arxiv]
- Yang et al. LaCo: Large Language Model Pruning via Layer Collaps. 2024. [arxiv]
- Bhardwaj et al. Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic. 2024. [arxiv]
- Yang et al. Enhancing Empathetic Response Generation by Augmenting LLMs with Small-scale Empathetic Models. 2024. [arxiv]
- Yi et al. Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding. 2024. [arxiv]
- Cao et al. Head-wise Shareable Attention for Large Language Models. 2024. [arxiv]
- Zhang et al. Enhancing Multilingual Capabilities of Large Language Models through Self-Distillation from Resource-Rich Languages. 2024. [arxiv]
- Kim et al. Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models. 2024. [arxiv]
- StarWhisper: A large language model for Astronomy, based on ChatGLM2-6B and Qwen-14B.
- DISC-LawLLM: A large language model specialized in Chinese legal domain, based on Baichuan-13B, is capable of retrieving and reasoning on legal knowledge.
- Sunsimiao: A large language model specialized in Chinese medical domain, based on Baichuan-7B and ChatGLM-6B.
- CareGPT: A series of large language models for Chinese medical domain, based on LLaMA2-7B and Baichuan-13B.
- MachineMindset: A series of MBTI Personality large language models, capable of giving any LLM 16 different personality types based on different datasets and training methods.
Tip
If you have a project that should be incorporated, please contact via email or create a pull request.
License
This repository is licensed under the Apache-2.0 License.
Please follow the model licenses to use the corresponding model weights: Baichuan2 / BLOOM / ChatGLM3 / DeepSeek / Falcon / Gemma / InternLM2 / LLaMA / LLaMA-2 / Mistral / OLMo / Phi-1.5/2 / Qwen / StarCoder2 / XVERSE / Yi / Yuan
Citation
If this work is helpful, please kindly cite as:
@Misc{llama-factory,
title = {LLaMA Factory},
author = {hiyouga},
howpublished = {\url{https://github.com/hiyouga/LLaMA-Factory}},
year = {2023}
}Acknowledgement
This repo benefits from PEFT, QLoRA and FastChat. Thanks for their wonderful works.
