llm-export

llm-export

llm-export can export llm model to onnx.

Stars: 255

Visit
 screenshot

llm-export is a tool for exporting llm models to onnx and mnn formats. It has features such as passing onnxruntime correctness tests, optimizing the original code to support dynamic shapes, reducing constant parts, optimizing onnx models using OnnxSlim for performance improvement, and exporting lora weights to onnx and mnn formats. Users can clone the project locally, clone the desired LLM project locally, and use LLMExporter to export the model. The tool supports various export options like exporting the entire model as one onnx model, exporting model segments as multiple models, exporting model vocabulary to a text file, exporting specific model layers like Embedding and lm_head, testing the model with queries, validating onnx model consistency with onnxruntime, converting onnx models to mnn models, and more. Users can specify export paths, skip optimization steps, and merge lora weights before exporting.

README:

llm-export

English

llm-export是一个llm模型导出工具,能够将llm模型导出为onnx和mnn模型。

  • 🚀 优化原始代码,支持动态形状
  • 🚀 优化原始代码,减少常量部分
  • 🚀 使用OnnxSlim优化onnx模型,性能提升约5%; by @inisis
  • 🚀 支持将lora权重导出为onnx和mnn
  • 🚀 MNN推理代码mnn-llm
  • 🚀 Onnx推理代码onnx-llm, OnnxLLM

安装

# pip install
pip install llmexport

# git install
pip install git+https://github.com/wangzhaode/llm-export@master

# local install
git clone https://github.com/wangzhaode/llm-export && cd llm-export/
pip install .

用法

  1. 下载模型
git clone https://huggingface.co/Qwen/Qwen2-1.5B-Instruct
# 如果huggingface下载慢可以使用modelscope
git clone https://modelscope.cn/qwen/Qwen2-1.5B-Instruct.git
  1. 测试模型
# 测试文本输入
llmexport --path Qwen2-1.5B-Instruct --test "你好"
# 测试图像文本
llmexport --path Qwen2-VL-2B-Instruct  --test "<img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>介绍一下图片里的内容"
  1. 导出模型
# 将Qwen2-1.5B-Instruct导出为onnx模型
llmexport --path Qwen2-1.5B-Instruct --export onnx
# 将Qwen2-1.5B-Instruct导出为mnn模型, 量化参数为4bit, blokc-wise = 128
llmexport --path Qwen2-1.5B-Instruct --export mnn --quant_bit 4 --quant_block 128

功能

  • 支持将模型为onnx或mnn模型,使用--export onnx--export mnn
  • 支持对模型进行对话测试,使用--test $query会返回llm的回复内容
  • 默认会使用onnx-slim对onnx模型进行优化,跳过该步骤使用--skip_slim
  • 制定量化bit数使用--quant_bit;量化的block大小使用--quant_block
  • 使用--lm_quant_bit来制定lm_head层权重的量化bit数,不指定则使用--quant_bit的量化bit数
  • 支持使用自己编译的MNNConvert,使用--mnnconvert
  • 支持awq量化
  • 支持使用GPTQ量化的模型权重
  • 支持LoRA权重的合并/分离导出,使用--lora_path--lora_split

参数

usage: llmexport.py [-h] --path PATH [--type TYPE] [--tokenizer_path TOKENIZER_PATH] [--lora_path LORA_PATH] [--gptq_path GPTQ_PATH] [--dst_path DST_PATH]
                    [--verbose] [--test TEST] [--export EXPORT] [--onnx_slim] [--quant_bit QUANT_BIT] [--quant_block QUANT_BLOCK] [--lm_quant_bit LM_QUANT_BIT]
                    [--mnnconvert MNNCONVERT] [--ppl] [--awq] [--sym] [--tie_embed] [--lora_split]

llm_exporter

options:
  -h, --help            show this help message and exit
  --path PATH           path(`str` or `os.PathLike`):
                        Can be either:
                        	- A string, the *model id* of a pretrained model like `THUDM/chatglm-6b`. [TODO]
                        	- A path to a *directory* clone from repo like `../chatglm-6b`.
  --type TYPE           type(`str`, *optional*):
                        	The pretrain llm model type.
  --tokenizer_path TOKENIZER_PATH
                        tokenizer path, defaut is `None` mean using `--path` value.
  --lora_path LORA_PATH
                        lora path, defaut is `None` mean not apply lora.
  --gptq_path GPTQ_PATH
                        gptq path, defaut is `None` mean not apply gptq.
  --dst_path DST_PATH   export onnx/mnn model to path, defaut is `./model`.
  --verbose             Whether or not to print verbose.
  --test TEST           test model inference with query `TEST`.
  --export EXPORT       export model to an onnx/mnn model.
  --onnx_slim           Whether or not to use onnx-slim.
  --quant_bit QUANT_BIT
                        mnn quant bit, 4 or 8, default is 4.
  --quant_block QUANT_BLOCK
                        mnn quant block, default is 0 mean channle-wise.
  --lm_quant_bit LM_QUANT_BIT
                        mnn lm_head quant bit, 4 or 8, default is `quant_bit`.
  --mnnconvert MNNCONVERT
                        local mnnconvert path, if invalid, using pymnn.
  --ppl                 Whether or not to get all logits of input tokens.
  --awq                 Whether or not to use awq quant.
  --sym                 Whether or not to using symmetric quant (without zeropoint), defualt is False.
  --tie_embed           Whether or not to using tie_embedding, defualt is False.
  --lora_split          Whether or not export lora split, defualt is False.

模型下载

Model ModelScope Hugging Face
Qwen-VL-Chat Q4_1 Q4_1
Baichuan2-7B-Chat Q4_1 Q4_1
bge-large-zh Q4_1 Q4_1
chatglm-6b Q4_1 Q4_1
chatglm2-6b Q4_1 Q4_1
chatglm3-6b Q4_1 Q4_1
codegeex2-6b Q4_1 Q4_1
deepseek-llm-7b-chat Q4_1 Q4_1
gemma-2-2b-it Q4_1 Q4_1
glm-4-9b-chat Q4_1 Q4_1
gte_sentence-embedding_multilingual-base Q4_1 Q4_1
internlm-chat-7b Q4_1 Q4_1
Llama-2-7b-chat Q4_1 Q4_1
Llama-3-8B-Instruct Q4_1 Q4_1
Llama-3.2-1B-Instruct Q4_1 Q4_1
Llama-3.2-3B-Instruct Q4_1 Q4_1
OpenELM-1_1B-Instruct Q4_1 Q4_1
OpenELM-270M-Instruct Q4_1 Q4_1
OpenELM-3B-Instruct Q8_1 Q8_1
OpenELM-450M-Instruct Q4_1 Q4_1
phi-2 Q4_1 Q4_1
qwen/Qwen-1_8B-Chat Q4_1 Q4_1
Qwen-7B-Chat Q4_1 Q4_1
Qwen1.5-0.5B-Chat Q4_1 Q4_1
Qwen1.5-1.8B-Chat Q4_1 Q4_1
Qwen1.5-4B-Chat Q4_1 Q4_1
Qwen1.5-7B-Chat Q4_1 Q4_1
Qwen2-0.5B-Instruct Q4_1 Q4_1
Qwen2-1.5B-Instruct Q4_1 Q4_1
Qwen2-7B-Instruct Q4_1 Q4_1
Qwen2-Audio-7B-Instruct Q4_1 Q4_1
Qwen2-VL-2B-Instruct Q4_1 Q4_1
Qwen2-VL-7B-Instruct Q4_1 Q4_1
Qwen2.5-0.5B-Instruct Q4_1 Q4_1
Qwen2.5-1.5B-Instruct Q4_1 Q4_1
Qwen2.5-3B-Instruct Q4_1 Q4_1
Qwen2.5-7B-Instruct Q4_1 Q4_1
Qwen2.5-Coder-1.5B-Instruct Q4_1 Q4_1
Qwen2.5-Coder-7B-Instruct Q4_1 Q4_1
Qwen2.5-Math-1.5B-Instruct Q4_1 Q4_1
Qwen2.5-Math-7B-Instruct Q4_1 Q4_1
QwQ-32B-Preview Q4_1 Q4_1
reader-lm-0.5b Q4_1 Q4_1
reader-lm-1.5b Q4_1 Q4_1
TinyLlama-1.1B-Chat-v1.0 Q4_1 Q4_1
Yi-6B-Chat Q4_1 Q4_1
MobileLLM-125M Q4_1 Q4_1
MobileLLM-350M Q4_1 Q4_1
MobileLLM-600M Q4_1 Q4_1
MobileLLM-1B Q4_1 Q4_1
SmolLM2-135M-Instruct Q4_1 Q4_1
SmolLM2-360M-Instruct Q4_1 Q4_1
SmolLM2-1.7B-Instruct Q4_1 Q4_1

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for llm-export

Similar Open Source Tools

For similar tasks

For similar jobs