LLM-Finetune-Guide

LLM-Finetune-Guide

Arrange methods and example on finetune LLMs

Stars: 60

Visit
 screenshot

This project provides a comprehensive guide to fine-tuning large language models (LLMs) with efficient methods like LoRA and P-tuning V2. It includes detailed instructions, code examples, and performance benchmarks for various LLMs and fine-tuning techniques. The guide also covers data preparation, evaluation, prediction, and running inference on CPU environments. By leveraging this guide, users can effectively fine-tune LLMs for specific tasks and applications.

README:

LLM Instruction Fine-Tuning

GitHub Repo stars GitHub Code License GitHub last commit GitHub pull request

This project compiles important concepts and programming frameworks for fine-tuning large language models, providing executable examples for training and inference of LLMs.

👋 Welcome to join our Line community Open Chat: fine-tuning large language models and OpenAI applications

Switch language version: [ English | 繁體中文 | 简体中文 ]

If you want to reduce trial and error, you are welcome to enroll in my personally recorded step-by-step tutorial course:

A-baoYang's GitHub stats

Efficient Parameters Fine-Tuning Methods

Currently, the following efficient fine-tuning methods are supported:

  • LoRA
  • P-tuning V2

Training Arguments:

LLM Fine-Tuning Method Quantization Methods Distributed Training Strategy Batch Size Required GPU memory (per card) Speed
Bloom LoRA INT8 None 1 14GB 86.71s/it
Bloom LoRA INT8 Torch DDP on 2 GPUs 1 13GB 44.47s/it
Bloom LoRA INT8 DeepSpeed ZeRO stage 3 on 2 GPUs 1 13GB 36.05s/it
ChatGLM-6B P-Tuning INT4 DeepSpeed ZeRO stage 3 on 2 GPUs 2 15GB 14.7s/it

Getting Started

Data Preparation

You can choose to fine-tune with open-source or academic datasets, but if the open-source datasets do not fit your application scenario, you will need to use custom datasets for fine-tuning.

In this project, the format used for the dataset is .json. You will need to put the train, dev, and test files of the separated dataset in the instruction-datasets/ directory. You can also create a new folder to place the files, but the path should be specified accordingly in the commands.

Requirements

Different fine-tuning methods have their required packages set up. To install them, simply navigate to the folder with requirements.txt and run:

git clone https://github.com/A-baoYang/LLM-FineTuning-Guide.git
conda create -n llm_ift python=3.8
conda activate llm_ift
cd LLM-Finetune-Guide/efficient-finetune/ptuning/v2
pip install -r requirements.txt

Fine-Tuning

After the data is prepared, you can start fine-tuning. The program has already been written and you can specify the data/model path and parameter replacement through the command.

Fine-Tuning with single GPU

CUDA_VISIBLE_DEVICES=0 python finetune.py \
    --do_train \
    --train_file ../../../instruction-datasets/$DATATAG/train.json \
    --validation_file ../../../instruction-datasets/$DATATAG/dev.json \
    --prompt_column input \
    --response_column output \
    --overwrite_cache \
    --model_name_or_path $MODEL_PATH \
    --output_dir finetuned/$DATATAG-$MODEL_TYPE-pt-$PRE_SEQ_LEN-$LR

Please refer to the complete parameter and command settings at: finetune.sh

Fine-Tuning with multiple GPUs

  • Start with torchrun
torchrun --standalone --nnodes=1  --nproc_per_node=2 finetune.py --do_train \
    --train_file ../../../instruction-datasets/$DATATAG/train.json \
    --validation_file ../../../instruction-datasets/$DATATAG/dev.json \
    --prompt_column input \
    --response_column output \
    --overwrite_cache \
    --model_name_or_path $MODEL_PATH \
    --output_dir finetuned/$DATATAG-$MODEL_TYPE-pt-$PRE_SEQ_LEN-$LR \

Please refer to the complete parameter and command settings at: finetune-ddp.sh

  • Start with accelerate
accelerate launch finetune.py --do_train \
    --train_file ../../../instruction-datasets/$DATATAG/train.json \
    --validation_file ../../../instruction-datasets/$DATATAG/dev.json \
    --prompt_column input \
    --response_column output \
    --overwrite_cache \
    --model_name_or_path $MODEL_PATH \
    --output_dir finetuned/$DATATAG-$MODEL_TYPE-pt-$PRE_SEQ_LEN-$LR \

Use DeepSpeed ZeRO strategy for distributed training

  • Start with accelerate and config_file arguments
accelerate launch --config_file ../../config/use_deepspeed.yaml finetune.py --do_train \
    --train_file ../../../instruction-datasets/$DATATAG/train.json \
    --validation_file ../../../instruction-datasets/$DATATAG/dev.json \
    --prompt_column input \
    --response_column output \
    --overwrite_cache \
    --model_name_or_path $MODEL_PATH \
    --output_dir finetuned/$DATATAG-$MODEL_TYPE-pt-$PRE_SEQ_LEN-$LR \
  • Start with deepspeed
deepspeed --num_nodes 1 --num_gpus 2 finetune.py \
    --deepspeed ../../config/zero_stage3_offload_config.json \
    --do_train \
    --train_file ../../../instruction-datasets/$DATATAG/train.json \
    --validation_file ../../../instruction-datasets/$DATATAG/dev.json \
    --prompt_column input \
    --response_column output \
    --overwrite_cache \
    --model_name_or_path $MODEL_PATH \
    --output_dir finetuned/$DATATAG-$MODEL_TYPE-pt-$PRE_SEQ_LEN-$LR \

Evaluation & Prediction

CUDA_VISIBLE_DEVICES=0 python finetune.py \
    --do_predict \
    --validation_file ../../../instruction-datasets/$DATATAG/dev.json \
    --test_file ../../../instruction-datasets/$DATATAG/test.json \
    --overwrite_cache \
    --prompt_column input \
    --response_column output \
    --model_name_or_path $MODEL_PATH \
    --ptuning_checkpoint finetuned/$DATATAG-$MODEL_TYPE-pt-$PRE_SEQ_LEN-$LR/checkpoint-$STEP \
    --output_dir finetuned/$DATATAG-$MODEL_TYPE-pt-$PRE_SEQ_LEN-$LR

Run Inference

  • Terminal
cd LLM-Finetune-Guide/efficient-finetune/ptuning/v2/serve/
CUDA_VISIBLE_DEVICES=0 python cli_demo.py \
    --pretrained_model_path THUDM/chatglm-6b \
    --ptuning_checkpoint ../finetuned/chatglm-6b-pt-512-2e-2/checkpoint-3000 \
    --is_cuda True
  • Web demo
cd LLM-Finetune-Guide/efficient-finetune/lora/serve/
python ui.py
  • Model API
cd LLM-Finetune-Guide/efficient-finetune/lora/serve/
python api.py

Running on CPU environment

The ability to run fine-tuned large language models in a CPU environment would greatly reduce the application threshold of LLMs.

  • Use INT4 to run in CPU environment
cd LLM-Finetune-Guide/efficient-finetune/ptuning/v2/serve/
CUDA_VISIBLE_DEVICES=0 python cli_demo.py \
    --pretrained_model_path THUDM/chatglm-6b \
    --ptuning_checkpoint ../finetuned/chatglm-6b-pt-512-2e-2/checkpoint-3000 \
    --quantization_bit 4 \
    --is_cuda True

License

  • Repository License: Apache-2.0 License
  • Model License: Please refer to the license provided by each language model for details.

Citation

If this project is helpful to your work or research, please star & cite it as follows:

@Misc{LLM-Finetune-Guide,
  title = {LLM Finetune Guide},
  author = {A-baoYang},
  howpublished = {\url{https://github.com/A-baoYang/LLM-Finetune-Guide}},
  year = {2023}
}

Acknowledgement

This project was inspired by some amazing projects, which are listed below. Thanks for their great work.

  • [THUDM/ChatGLM-6B]
  • [ymcui/Chinese-LLaMA-Alpaca]
  • [tloen/alpaca-lora]

Contact

If you have any questions or suggestions, please feel free to email us for inquiries: [email protected]

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for LLM-Finetune-Guide

Similar Open Source Tools

For similar tasks

For similar jobs