open-chatgpt

open-chatgpt

The open source implementation of ChatGPT, Alpaca, Vicuna and RLHF Pipeline. 从0开始实现一个ChatGPT.

Stars: 179

Visit
 screenshot

Open-ChatGPT is an open-source library that enables users to train a hyper-personalized ChatGPT-like AI model using their own data with minimal computational resources. It provides an end-to-end training framework for ChatGPT-like models, supporting distributed training and offloading for extremely large models. The project implements RLHF (Reinforcement Learning with Human Feedback) powered by transformer library and DeepSpeed, allowing users to create high-quality ChatGPT-style models. Open-ChatGPT is designed to be user-friendly and efficient, aiming to empower users to develop their own conversational AI models easily.

README:

 

中文 | English

Open-ChatGPT: An open-source implementation of ChatGPT

Code License Python 3.9+ Code style: black

Introduction

Open-ChatGPT is a open-source library that allows you to train a hyper-personalized ChatGPT-like ai model using your own data and the least amount of compute possible.

Open-ChatGPT is a general system framework for enabling an end-to-end training experience for ChatGPT-like models. It can automatically take your favorite pre-trained large language models though an OpenAI InstructGPT style three stages to produce your very own high-quality ChatGPT-style model.

We have Impleamented RLHF (Reinforcement Learning with Human Feedback) powered by transformer library and DeepsSpeed. It supports distributed training and offloading, which can fit extremly large models.

If you like the project, please show your support by leaving a star ⭐.

News

  • [2023/05] 🔥 We implement Stanford Alpaca Lora.
  • [2023/05] 🔥 We implement Stanford Alpaca.
  • [2023/04] We released RLHF(Reinforcement Learning with Human Feedback) Pipeline .
  • [2023/03] We released the code OpenChatGPT: An Open-Source libraray to train ChatBot like ChatGPT.

Table of Contents

Install

git clone https://github.com/jianzhnie/open-chatgpt.git
pip install -r requirements.txt

PEFT

  • If you would like to use LORA along with other parameter-efficient methods, please install peft as an additional dependency.

DeepSpeed

  • If you want to accelerate LLM training using techniques such as pipeline parallelism, gradient checkpointing, and tensor fusion. Please install DeepSpeed.

Instruction Fintune

Fine-tuning Alpaca-7B

We fine-tune our models using standard Hugging Face training code. We fine-tune LLaMA-7B and LLaMA-13B with the following hyperparameters:

Hyperparameter LLaMA-7B LLaMA-13B
Batch size 128 128
Learning rate 2e-5 1e-5
Epochs 3 5
Max length 512 512
Weight decay 0 0

You can use the following command to train Alpaca-7B with 4 x A100 (40GB).

cd examples/alpaca/
python train_alpaca.py \
    --model_name_or_path  'decapoda-research/llama-7b-hf' \
    --data_path tatsu-lab/alpaca  \
    --output_dir work_dir/ \
    --num_train_epochs 3 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 5 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1

Using DeepSpeed

If you meet OOM error, consider this.

Naively, fine-tuning a 7B model requires about 7 x 4 x 4 = 112 GB of VRAM. Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU. If you'd like to further reduce the memory footprint, here are some options:

  • Turn on CPU offload for FSDP with --fsdp "full_shard auto_wrap offload". This saves VRAM at the cost of longer runtime.
  • In our experience, DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP with offload. Here's an example to use DeepSpeed stage-3 with 4 GPUs with both parameter and optimizer offload:
pip install deepspeed
cd examples/alpaca/
torchrun --nproc_per_node=8 train_alpaca.py \
    --model_name_or_path  'decapoda-research/llama-7b-hf' \
    --data_path tatsu-lab/alpaca  \
    --output_dir work_dir/  \
    --num_train_epochs 3 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 2 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 5 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --deepspeed "scripts/ds_config_zero3_auto.json"
  • LoRA fine-tunes low-rank slices of the query, key, and value embedding heads. This can reduce the total memory footprint from 112GB to about 7x4=28GB.

Fine-tuning Alpaca-7B with Lora

This part reproducing the Stanford Alpaca results using low-rank adaptation (LoRA).

To fine-tune cheaply and efficiently, we use Hugging Face's PEFT as well as Tim Dettmers' bitsandbytes.

This file contains a straightforward application of PEFT to the LLaMA model, as well as some code related to prompt construction and tokenization.

python train_alpaca_lora.py \
    --model_name_or_path  decapoda-research/llama-7b-hf  \
    --data_path tatsu-lab/alpaca  \
    --output_dir work_dir_lora/ \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 8 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 2000 \
    --save_total_limit 5 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1

Inference

This file reads the foundation model from the Hugging Face model hub and the LoRA weights from tloen/alpaca-lora-7b, and runs a Gradio interface for inference on a specified input. Users should treat this as example code for the use of the model, and modify it as needed.

Example usage:

python generate_server.py \
    --model_name_or_path decapoda-research/llama-7b-hf \
    --lora_model_name_or_path  tloen/alpaca-lora-7b

No Enough Memory

If you do not have enough memory, you can enable 8-bit compression by adding --load-8bit to commands above. This can reduce memory usage by around half with slightly degraded model quality. It is compatible with the CPU, GPU, and Metal backend. Alpaca-7B with 8-bit compression can run on a single NVIDIA 3090/4080/T4/V100(16GB) GPU.

python generate_server.py \
    --model_name_or_path decapoda-research/llama-7b-hf \
    --lora_model_name_or_path  tloen/alpaca-lora-7b \
    --load_8bit

Contributing

Our purpose is to make this repo even better. If you are interested in contributing, please refer to HERE for instructions in contribution.

License

Openn-ChatGPT is released under the Apache 2.0 license.

Acknowledgements

We appreciate the work by many open-source contributors, especially:

Citation

Please cite the repo if you use the data or code in this repo.

@misc{open-chatgpt,
  author = {jianzhnie},
  title = {Open-ChatGPT, a chatbot based on Llama model},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/jianzhnie/open-chatgpt}},
}

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for open-chatgpt

Similar Open Source Tools

For similar tasks

For similar jobs