Warning
Merging LoRA weights into a quantized model is not supported.
Tip
Use --model_name_or_path path_to_model solely to use the
exported model or model fine-tuned in full/freeze mode.
Use CUDA_VISIBLE_DEVICES=0,
--export_quantization_bit 4 and
--export_quantization_dataset data/c4_demo.json to quantize
the model with AutoGPTQ after merging the LoRA weights.
Usage:
merge.sh: merge the lora weightsquantize.sh: quantize the model with AutoGPTQ (must after merge.sh, optional)