bert4torch

An elegent pytorch implement of transformers

Stars: 1284

Visit

**bert4torch** is a high-level framework for training and deploying transformer models in PyTorch. It provides a simple and efficient API for building, training, and evaluating transformer models, and supports a wide range of pre-trained models, including BERT, RoBERTa, ALBERT, XLNet, and GPT-2. bert4torch also includes a number of useful features, such as data loading, tokenization, and model evaluation. It is a powerful and versatile tool for natural language processing tasks.

README:

Documentation | Torch4keras | Examples | build_MiniLLM_from_scratch | bert4vector

1. 下载安装

安装稳定版

pip install bert4torch

安装最新版

pip install git+https://github.com/Tongjilibo/bert4torch

注意事项：pip包的发布慢于git上的开发版本，git clone注意引用路径，注意权重是否需要转换
测试用例：git clone https://github.com/Tongjilibo/bert4torch，修改example中的预训练模型文件路径和数据路径即可启动脚本
自行训练：针对自己的数据，修改相应的数据处理代码块
开发环境：原使用torch==1.10版本进行开发，现已切换到torch2.0开发，如其他版本遇到不适配，欢迎反馈

2. 功能

LLM模型: 加载chatglm、llama、 baichuan、ziya、bloom等开源大模型权重进行推理和微调，命令行一行部署大模型
核心功能：加载bert、roberta、albert、xlnet、nezha、bart、RoFormer、RoFormer_V2、ELECTRA、GPT、GPT2、T5、GAU-alpha、ERNIE等预训练权重继续进行finetune、并支持在bert基础上灵活定义自己模型
丰富示例：包含llm、pretrain、sentence_classfication、sentence_embedding、sequence_labeling、relation_extraction、seq2seq、serving等多种解决方案
实验验证：已在公开数据集实验验证，使用如下examples数据集和实验指标
易用trick：集成了常见的trick，即插即用
其他特性：加载transformers库模型一起使用；调用方式简洁高效；有训练进度条动态展示；配合torchinfo打印参数量；默认Logger和Tensorboard简便记录训练过程；自定义fit过程，满足高阶需求
训练过程：

功能	bert4torch	transformers	备注
训练进度条	✅	✅	进度条打印loss和定义的metrics
分布式训练dp/ddp	✅	✅	torch自带dp/ddp
各类callbacks	✅	✅	日志/tensorboard/earlystop/wandb等
大模型推理，stream/batch输出	✅	✅	各个模型是通用的，无需单独维护脚本
大模型微调	✅	✅	lora依赖peft库，pv2自带
丰富tricks	✅	❌	对抗训练等tricks即插即用
代码简洁易懂，自定义空间大	✅	❌	代码复用度高, keras代码训练风格
仓库的维护能力/影响力/使用量/兼容性	❌	✅	目前仓库个人维护
一键部署大模型

3. 快速上手

3.1 上手教程

3.2 命令行快速部署大模型服务

本地 / 联网加载

# 联网下载全部文件
bert4torch-llm-server --checkpoint_path Qwen2-0.5B-Instruct

# 加载本地大模型，联网下载bert4torch_config.json
bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --config_path Qwen/Qwen2-0.5B-Instruct

# 加载本地大模型，且bert4torch_config.json已经下载并放于同名目录下
bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct

命令行 / gradio网页 / openai_api

# 命令行
bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --mode cli

# gradio网页
bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --mode gradio

# openai_api
bert4torch-llm-server --checkpoint_path /data/pretrain_ckpt/Qwen/Qwen2-0.5B-Instruct --mode openai

命令行聊天示例

4. 版本和更新历史

4.1 版本历史

更新日期	bert4torch	torch4keras	版本说明
20250401	0.5.6	0.2.9	命令行支持图片输入, 修复rope在batch推理和超长时候的bug
20250215	0.5.5	0.2.8	增加deepseek-r1, internvl, internlm3, glm4v, modernbert, mllama, qwen2vl, qwenvl
20240928	0.5.4	0.2.7	【新功能】增加deepseek系列、MiniCPM、MiniCPMV、llama3.2、Qwen2.5；支持device_map=auto;【修复】修复batch_generate和n>1的bug
20240814	0.5.3	0.2.6	【新功能】增加llama3.1/Yi1.5；自动选择从hfmirror下载；支持命令行参数`bert4torch-llm-server`

更多版本

4.2 更新历史

更多历史

5. 预训练权重

预训练模型支持多种代码加载方式

from bert4torch.models import build_transformer_model

# 1. 仅指定config_path: 从头初始化模型结构, 不加载预训练模型
model = build_transformer_model('./model/bert4torch_config.json')

# 2. 仅指定checkpoint_path: 
## 2.1 文件夹路径: 自动寻找路径下的*.bin/*.safetensors权重文件 + 需把bert4torch_config.json下载并放于该目录下
model = build_transformer_model(checkpoint_path='./model')

## 2.2 文件路径/列表: 文件路径即权重路径/列表, bert4torch_config.json会从同级目录下寻找
model = build_transformer_model(checkpoint_path='./pytorch_model.bin')

## 2.3 model_name: hf上预训练权重名称, 会自动下载hf权重以及bert4torch_config.json文件
model = build_transformer_model(checkpoint_path='bert-base-chinese')

# 3. 同时指定config_path和checkpoint_path(本地路径名或model_name排列组合): 
#    本地路径从本地加载，pretrained_model_name会联网下载
config_path = './model/bert4torch_config.json'  # 或'bert-base-chinese'
checkpoint_path = './model/pytorch_model.bin'  # 或'bert-base-chinese'
model = build_transformer_model(config_path, checkpoint_path)

预训练权重链接和bert4torch_config.json

*注：

高亮格式(如bert-base-chinese)的表示可直接build_transformer_model()联网下载
国内镜像网站加速下载
- HF_ENDPOINT=https://hf-mirror.com python your_script.py
- export HF_ENDPOINT=https://hf-mirror.com后再执行python代码
- 在python代码开头如下设置
```
import os
os.environ['HF_ENDPOINT'] = "https://hf-mirror.com"
```

6. 鸣谢

感谢苏神实现的bert4keras，本实现有不少地方参考了bert4keras的源码，在此衷心感谢大佬的无私奉献;
其次感谢项目bert4pytorch，也是在该项目的指引下给了我用pytorch来复现bert4keras的想法和思路。

7. 引用

@misc{bert4torch,
  title={bert4torch},
  author={Bo Li},
  year={2022},
  howpublished={\url{https://github.com/Tongjilibo/bert4torch}},
}

8. 其他

Wechat & Star History Chart
微信群人数超过200个（有邀请限制），可添加个人微信拉群

微信号

微信群

Star History Chart

For Tasks:

Click tags to check more tools for each tasks

classify text label sequences answer questions translate text summarize text

For Jobs:

text classification sequence labeling question answering machine translation text summarization

Alternative AI tools for bert4torch

Similar Open Source Tools

bert4torch

github

: 1.3k

Qwen-TensorRT-LLM

Qwen-TensorRT-LLM is a project developed for the NVIDIA TensorRT Hackathon 2023, focusing on accelerating inference for the Qwen-7B-Chat model using TRT-LLM. The project offers various functionalities such as FP16/BF16 support, INT8 and INT4 quantization options, Tensor Parallel for multi-GPU parallelism, web demo setup with gradio, Triton API deployment for maximum throughput/concurrency, fastapi integration for openai requests, CLI interaction, and langchain support. It supports models like qwen2, qwen, and qwen-vl for both base and chat models. The project also provides tutorials on Bilibili and blogs for adapting Qwen models in NVIDIA TensorRT-LLM, along with hardware requirements and quick start guides for different model types and quantization methods.

github

: 484

Chinese-Mixtral-8x7B

Chinese-Mixtral-8x7B is an open-source project based on Mistral's Mixtral-8x7B model for incremental pre-training of Chinese vocabulary, aiming to advance research on MoE models in the Chinese natural language processing community. The expanded vocabulary significantly improves the model's encoding and decoding efficiency for Chinese, and the model is pre-trained incrementally on a large-scale open-source corpus, enabling it with powerful Chinese generation and comprehension capabilities. The project includes a large model with expanded Chinese vocabulary and incremental pre-training code.

github

: 635

Muice-Chatbot

Muice-Chatbot is an AI chatbot designed to proactively engage in conversations with users. It is based on the ChatGLM2-6B and Qwen-7B models, with a training dataset of 1.8K+ dialogues. The chatbot has a speaking style similar to a 2D girl, being somewhat tsundere but willing to share daily life details and greet users differently every day. It provides various functionalities, including initiating chats and offering 5 available commands. The project supports model loading through different methods and provides onebot service support for QQ users. Users can interact with the chatbot by running the main.py file in the project directory.

github

: 314

Langchain-Chatchat

LangChain-Chatchat is an open-source, offline-deployable retrieval-enhanced generation (RAG) large model knowledge base project based on large language models such as ChatGLM and application frameworks such as Langchain. It aims to establish a knowledge base Q&A solution that is friendly to Chinese scenarios, supports open-source models, and can run offline.

github

: 34.4k

ms-copilot-play

Microsoft Copilot Play is a Cloudflare Worker service that accelerates Microsoft Copilot functionalities in China. It allows high-speed access to Microsoft Copilot features like chatting, notebook, plugins, image generation, and sharing. The service filters out meaningless requests used for statistics, saving up to 80% of Cloudflare Worker requests. Users can deploy the service easily with Cloudflare Worker, ensuring fast and unlimited access with no additional operations. The service leverages the power of Microsoft Copilot, based on OpenAI GPT-4, and utilizes Bing search to answer questions.

github

: 221

build_MiniLLM_from_scratch

This repository aims to build a low-parameter LLM model through pretraining, fine-tuning, model rewarding, and reinforcement learning stages to create a chat model capable of simple conversation tasks. It features using the bert4torch training framework, seamless integration with transformers package for inference, optimized file reading during training to reduce memory usage, providing complete training logs for reproducibility, and the ability to customize robot attributes. The chat model supports multi-turn conversations. The trained model currently only supports basic chat functionality due to limitations in corpus size, model scale, SFT corpus size, and quality.

github

: 397

Element-Plus-X

github

: 289

swift

SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) supports training, inference, evaluation and deployment of nearly **200 LLMs and MLLMs** (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by [PEFT](https://github.com/huggingface/peft), we also provide a complete **Adapters library** to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts. To facilitate use by users unfamiliar with deep learning, we provide a Gradio web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners. Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.

github

: 2.7k

auto-round

AutoRound is an advanced weight-only quantization algorithm for low-bits LLM inference. It competes impressively against recent methods without introducing any additional inference overhead. The method adopts sign gradient descent to fine-tune rounding values and minmax values of weights in just 200 steps, often significantly outperforming SignRound with the cost of more tuning time for quantization. AutoRound is tailored for a wide range of models and consistently delivers noticeable improvements.

github

: 414

VideoLLaMA2

VideoLLaMA 2 is a project focused on advancing spatial-temporal modeling and audio understanding in video-LLMs. It provides tools for multi-choice video QA, open-ended video QA, and video captioning. The project offers model zoo with different configurations for visual encoder and language decoder. It includes training and evaluation guides, as well as inference capabilities for video and image processing. The project also features a demo setup for running a video-based Large Language Model web demonstration.

github

: 630

ipex-llm

IPEX-LLM is a PyTorch library for running Large Language Models (LLMs) on Intel CPUs and GPUs with very low latency. It provides seamless integration with various LLM frameworks and tools, including llama.cpp, ollama, Text-Generation-WebUI, HuggingFace transformers, and more. IPEX-LLM has been optimized and verified on over 50 LLM models, including LLaMA, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM, Baichuan, Qwen, and RWKV. It supports a range of low-bit inference formats, including INT4, FP8, FP4, INT8, INT2, FP16, and BF16, as well as finetuning capabilities for LoRA, QLoRA, DPO, QA-LoRA, and ReLoRA. IPEX-LLM is actively maintained and updated with new features and optimizations, making it a valuable tool for researchers, developers, and anyone interested in exploring and utilizing LLMs.

github

: 6.9k

UMOE-Scaling-Unified-Multimodal-LLMs

Uni-MoE is a MoE-based unified multimodal model that can handle diverse modalities including audio, speech, image, text, and video. The project focuses on scaling Unified Multimodal LLMs with a Mixture of Experts framework. It offers enhanced functionality for training across multiple nodes and GPUs, as well as parallel processing at both the expert and modality levels. The model architecture involves three training stages: building connectors for multimodal understanding, developing modality-specific experts, and incorporating multiple trained experts into LLMs using the LoRA technique on mixed multimodal data. The tool provides instructions for installation, weights organization, inference, training, and evaluation on various datasets.

github

: 682

phoenix

Phoenix is a tool that provides MLOps and LLMOps insights at lightning speed with zero-config observability. It offers a notebook-first experience for monitoring models and LLM Applications by providing LLM Traces, LLM Evals, Embedding Analysis, RAG Analysis, and Structured Data Analysis. Users can trace through the execution of LLM Applications, evaluate generative models, explore embedding point-clouds, visualize generative application's search and retrieval process, and statistically analyze structured data. Phoenix is designed to help users troubleshoot problems related to retrieval, tool execution, relevance, toxicity, drift, and performance degradation.

github

: 5.3k

DownEdit

github

: 323

xiaomi_airpurifier

This repository contains a custom component for Home Assistant that integrates various Xiaomi Mi Air Purifier and Xiaomi Mi Air Humidifier models. It provides detailed support for different devices, including power control, preset modes, child lock, LED control, favorite level adjustment, and various attributes monitoring. The custom component offers a more extensive range of supported devices compared to the official Home Assistant component, with additional features and device compatibility. Users can easily set up and configure their Xiaomi air purifiers and humidifiers within Home Assistant for enhanced control and monitoring.

github

: 446

For similar tasks

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

onnxruntime-genai

ONNX Runtime Generative AI is a library that provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. Users can call a high level `generate()` method, or run each iteration of the model in a loop. It supports greedy/beam search and TopP, TopK sampling to generate token sequences, has built in logits processing like repetition penalties, and allows for easy custom scoring.

github

: 442

jupyter-ai

Jupyter AI connects generative AI with Jupyter notebooks. It provides a user-friendly and powerful way to explore generative AI models in notebooks and improve your productivity in JupyterLab and the Jupyter Notebook. Specifically, Jupyter AI offers: * An `%%ai` magic that turns the Jupyter notebook into a reproducible generative AI playground. This works anywhere the IPython kernel runs (JupyterLab, Jupyter Notebook, Google Colab, Kaggle, VSCode, etc.). * A native chat UI in JupyterLab that enables you to work with generative AI as a conversational assistant. * Support for a wide range of generative model providers, including AI21, Anthropic, AWS, Cohere, Gemini, Hugging Face, NVIDIA, and OpenAI. * Local model support through GPT4All, enabling use of generative AI models on consumer grade machines with ease and privacy.

github

: 3.5k

khoj

Khoj is an open-source, personal AI assistant that extends your capabilities by creating always-available AI agents. You can share your notes and documents to extend your digital brain, and your AI agents have access to the internet, allowing you to incorporate real-time information. Khoj is accessible on Desktop, Emacs, Obsidian, Web, and Whatsapp, and you can share PDF, markdown, org-mode, notion files, and GitHub repositories. You'll get fast, accurate semantic search on top of your docs, and your agents can create deeply personal images and understand your speech. Khoj is self-hostable and always will be.

github

: 28.5k

langchain_dart

LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).

github

: 497

danswer

Danswer is an open-source Gen-AI Chat and Unified Search tool that connects to your company's docs, apps, and people. It provides a Chat interface and plugs into any LLM of your choice. Danswer can be deployed anywhere and for any scale - on a laptop, on-premise, or to cloud. Since you own the deployment, your user data and chats are fully in your own control. Danswer is MIT licensed and designed to be modular and easily extensible. The system also comes fully ready for production usage with user authentication, role management (admin/basic users), chat persistence, and a UI for configuring Personas (AI Assistants) and their Prompts. Danswer also serves as a Unified Search across all common workplace tools such as Slack, Google Drive, Confluence, etc. By combining LLMs and team specific knowledge, Danswer becomes a subject matter expert for the team. Imagine ChatGPT if it had access to your team's unique knowledge! It enables questions such as "A customer wants feature X, is this already supported?" or "Where's the pull request for feature Y?"

github

: 10.5k

infinity

Infinity is an AI-native database designed for LLM applications, providing incredibly fast full-text and vector search capabilities. It supports a wide range of data types, including vectors, full-text, and structured data, and offers a fused search feature that combines multiple embeddings and full text. Infinity is easy to use, with an intuitive Python API and a single-binary architecture that simplifies deployment. It achieves high performance, with 0.1 milliseconds query latency on million-scale vector datasets and up to 15K QPS.

github

: 3.3k

For similar jobs

curated-transformers

Curated Transformers is a transformer library for PyTorch that provides state-of-the-art models composed of reusable components. It supports various transformer architectures, including encoders like ALBERT, BERT, and RoBERTa, and decoders like Falcon, Llama, and MPT. The library emphasizes consistent type annotations, minimal dependencies, and ease of use for education and research. It has been production-tested by Explosion and will be the default transformer implementation in spaCy 3.7.

github

: 833

bert4torch

github

: 1.3k

ring-attention-pytorch

This repository contains an implementation of Ring Attention, a technique for processing large sequences in transformers. Ring Attention splits the data across the sequence dimension and applies ring reduce to the processing of the tiles of the attention matrix, similar to flash attention. It also includes support for Striped Attention, a follow-up paper that permutes the sequence for better workload balancing for autoregressive transformers, and grouped query attention, which saves on communication costs during the ring reduce. The repository includes a CUDA version of the flash attention kernel, which is used for the forward and backward passes of the ring attention. It also includes logic for splitting the sequence evenly among ranks, either within the attention function or in the external ring transformer wrapper, and basic test cases with two processes to check for equivalent output and gradients.

github

: 405

FlagEmbedding

FlagEmbedding focuses on retrieval-augmented LLMs, consisting of the following projects currently: * **Long-Context LLM** : Activation Beacon * **Fine-tuning of LM** : LM-Cocktail * **Embedding Model** : Visualized-BGE, BGE-M3, LLM Embedder, BGE Embedding * **Reranker Model** : llm rerankers, BGE Reranker * **Benchmark** : C-MTEB

github

: 8.8k

edenai-apis

Eden AI aims to simplify the use and deployment of AI technologies by providing a unique API that connects to all the best AI engines. With the rise of **AI as a Service** , a lot of companies provide off-the-shelf trained models that you can access directly through an API. These companies are either the tech giants (Google, Microsoft , Amazon) or other smaller, more specialized companies, and there are hundreds of them. Some of the most known are : DeepL (translation), OpenAI (text and image analysis), AssemblyAI (speech analysis). There are **hundreds of companies** doing that. We're regrouping the best ones **in one place** !

github

: 441

mistral.rs

Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.

github

: 5.4k

home-llm

Home LLM is a project that provides the necessary components to control your Home Assistant installation with a completely local Large Language Model acting as a personal assistant. The goal is to provide a drop-in solution to be used as a "conversation agent" component by Home Assistant. The 2 main pieces of this solution are Home LLM and Llama Conversation. Home LLM is a fine-tuning of the Phi model series from Microsoft and the StableLM model series from StabilityAI. The model is able to control devices in the user's house as well as perform basic question and answering. The fine-tuning dataset is a custom synthetic dataset designed to teach the model function calling based on the device information in the context. Llama Conversation is a custom component that exposes the locally running LLM as a "conversation agent" in Home Assistant. This component can be interacted with in a few ways: using a chat interface, integrating with Speech-to-Text and Text-to-Speech addons, or running the oobabooga/text-generation-webui project to provide access to the LLM via an API interface.

github

: 603

llama_ros

This repository provides a set of ROS 2 packages to integrate llama.cpp into ROS 2. By using the llama_ros packages, you can easily incorporate the powerful optimization capabilities of llama.cpp into your ROS 2 projects by running GGUF-based LLMs and VLMs.

github

: 195

模型分类	模型名称	权重来源	权重链接/checkpoint_path	config_path
bert	bert-base-chinese	google-bert	`google-bert/bert-base-chinese`	`google-bert/bert-base-chinese`
	chinese_L-12_H-768_A-12	谷歌	tf权重 `Tongjilibo/bert-chinese_L-12_H-768_A-12`
	chinese-bert-wwm-ext	HFL	`hfl/chinese-bert-wwm-ext`	`hfl/chinese-bert-wwm-ext`
	bert-base-multilingual-cased	google-bert	`google-bert/bert-base-multilingual-cased`	`google-bert/bert-base-multilingual-cased`
	bert-base-cased	google-bert	`google-bert/bert-base-cased`	`google-bert/bert-base-cased`
	bert-base-uncased	google-bert	`google-bert/bert-base-uncased`	`google-bert/bert-base-uncased`
	MacBERT	HFL	`hfl/chinese-macbert-base` `hfl/chinese-macbert-large`	`hfl/chinese-macbert-base` `hfl/chinese-macbert-large`
	WoBERT	追一科技	`junnyu/wobert_chinese_base`，`junnyu/wobert_chinese_plus_base`	`junnyu/wobert_chinese_base` `junnyu/wobert_chinese_plus_base`
roberta	chinese-roberta-wwm-ext	HFL	`hfl/chinese-roberta-wwm-ext` `hfl/chinese-roberta-wwm-ext-large` (large的mlm权重是随机初始化)	`hfl/chinese-roberta-wwm-ext` `hfl/chinese-roberta-wwm-ext-large`
	roberta-small/tiny	追一科技	`Tongjilibo/chinese_roberta_L-4_H-312_A-12` `Tongjilibo/chinese_roberta_L-6_H-384_A-12`
	roberta-base	FacebookAI	`FacebookAI/roberta-base`	`FacebookAI/roberta-base`
	guwenbert	ethanyt	`ethanyt/guwenbert-base`	`ethanyt/guwenbert-base`
albert	albert_zh albert_pytorch	brightmart	`voidful/albert_chinese_tiny` `voidful/albert_chinese_small` `voidful/albert_chinese_base` `voidful/albert_chinese_large` `voidful/albert_chinese_xlarge` `voidful/albert_chinese_xxlarge`	`voidful/albert_chinese_tiny` `voidful/albert_chinese_small` `voidful/albert_chinese_base` `voidful/albert_chinese_large` `voidful/albert_chinese_xlarge` `voidful/albert_chinese_xxlarge`
nezha	NEZHA NeZha_Chinese_PyTorch	huawei_noah	`sijunhe/nezha-cn-base` `sijunhe/nezha-cn-large` `sijunhe/nezha-base-wwm` `sijunhe/nezha-large-wwm`	`sijunhe/nezha-cn-base` `sijunhe/nezha-cn-large` `sijunhe/nezha-base-wwm` `sijunhe/nezha-large-wwm`
	nezha_gpt_dialog	bojone	`Tongjilibo/nezha_gpt_dialog`
xlnet	Chinese-XLNet	HFL	`hfl/chinese-xlnet-base`	`hfl/chinese-xlnet-base`
	tranformer_xl	huggingface	`transfo-xl/transfo-xl-wt103`	`transfo-xl/transfo-xl-wt103`
deberta	Erlangshen-DeBERTa-v2	IDEA	`IDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese` `IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese` `IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-Chinese`	`IDEA-CCNL/Erlangshen-DeBERTa-v2-97M-Chinese` `IDEA-CCNL/Erlangshen-DeBERTa-v2-320M-Chinese` `IDEA-CCNL/Erlangshen-DeBERTa-v2-710M-Chinese`
electra	Chinese-ELECTRA	HFL	`hfl/chinese-electra-base-discriminator`	`hfl/chinese-electra-base-discriminator`
ernie	ernie	百度文心	`nghuyong/ernie-1.0-base-zh` `nghuyong/ernie-3.0-base-zh`	`nghuyong/ernie-1.0-base-zh` `nghuyong/ernie-3.0-base-zh`
roformer	roformer	追一科技	`junnyu/roformer_chinese_base`	`junnyu/roformer_chinese_base`
	roformer_v2	追一科技	`junnyu/roformer_v2_chinese_char_base`	`junnyu/roformer_v2_chinese_char_base`
simbert	simbert	追一科技	`Tongjilibo/simbert-chinese-base` `Tongjilibo/simbert-chinese-small` `Tongjilibo/simbert-chinese-tiny`
	simbert_v2/roformer-sim	追一科技	`junnyu/roformer_chinese_sim_char_base`，`junnyu/roformer_chinese_sim_char_ft_base`，`junnyu/roformer_chinese_sim_char_small`，`junnyu/roformer_chinese_sim_char_ft_small`	`junnyu/roformer_chinese_sim_char_base` `junnyu/roformer_chinese_sim_char_ft_base` `junnyu/roformer_chinese_sim_char_small` `junnyu/roformer_chinese_sim_char_ft_small`
gau	GAU-alpha	追一科技	`Tongjilibo/chinese_GAU-alpha-char_L-24_H-768`
ModernBERT	ModernBERT	answerdotai	`answerdotai/ModernBERT-base` `answerdotai/ModernBERT-large`	`answerdotai/ModernBERT-base` `answerdotai/ModernBERT-large`
uie	uie uie_pytorch	百度	`Tongjilibo/uie-base`
gpt	CDial-GPT	thu-coai	`thu-coai/CDial-GPT_LCCC-base` `thu-coai/CDial-GPT_LCCC-large`	`thu-coai/CDial-GPT_LCCC-base` `thu-coai/CDial-GPT_LCCC-large`
	cmp_lm(26亿)	清华	`TsinghuaAI/CPM-Generate`	`TsinghuaAI/CPM-Generate`
	nezha_gen	huawei_noah	`Tongjilibo/chinese_nezha_gpt_L-12_H-768_A-12`
	gpt2-chinese-cluecorpussmall	UER	`uer/gpt2-chinese-cluecorpussmall`	`uer/gpt2-chinese-cluecorpussmall`
	gpt2-ml	imcaspar	torch BaiduYun(84dh)	`gpt2-ml_15g_corpus` `gpt2-ml_30g_corpus`
bart	bart_base_chinese	复旦fnlp	`fnlp/bart-base-chinese` v1.0	`fnlp/bart-base-chinese` `fnlp/bart-base-chinese-v1.0`
t5	t5	UER	`uer/t5-small-chinese-cluecorpussmall` `uer/t5-base-chinese-cluecorpussmall`	`uer/t5-base-chinese-cluecorpussmall` `uer/t5-small-chinese-cluecorpussmall`
	mt5	谷歌	`google/mt5-base`	`google/mt5-base`
	t5_pegasus	追一科技	`Tongjilibo/chinese_t5_pegasus_small` `Tongjilibo/chinese_t5_pegasus_base`
	chatyuan	clue-ai	`ClueAI/ChatYuan-large-v1` `ClueAI/ChatYuan-large-v2`	`ClueAI/ChatYuan-large-v1` `ClueAI/ChatYuan-large-v2`
	PromptCLUE	clue-ai	`ClueAI/PromptCLUE-base`	`ClueAI/PromptCLUE-base`
chatglm	chatglm-6b	THUDM	`THUDM/chatglm-6b` `THUDM/chatglm-6b-int8` `THUDM/chatglm-6b-int4` v0.1.0	`THUDM/chatglm-6b` `THUDM/chatglm-6b-int8` `THUDM/chatglm-6b-int4` `THUDM/chatglm-6b-v0.1.0`
	chatglm2-6b	THUDM	`THUDM/chatglm2-6b` `THUDM/chatglm2-6b-int4` `THUDM/chatglm2-6b-32k`	`THUDM/chatglm2-6b` `THUDM/chatglm2-6b-int4` `THUDM/chatglm2-6b-32k`
	chatglm3-6b	THUDM	`THUDM/chatglm3-6b` `THUDM/chatglm3-6b-32k`	`THUDM/chatglm3-6b` `THUDM/chatglm3-6b-32k`
	glm4-9b	THUDM	`THUDM/glm-4-9b` `THUDM/glm-4-9b-chat` `THUDM/glm-4-9b-chat-1m`	`THUDM/glm-4-9b` `THUDM/glm-4-9b-chat` `THUDM/glm-4-9b-chat-1m`
	glm4v-9b	THUDM	`THUDM/glm-4v-9b`	`THUDM/glm-4v-9b`
llama	llama	meta		`meta-llama/llama-7b` `meta-llama/llama-13b`
	llama-2	meta	meta-llama/Llama-2-7b-hf meta-llama/Llama-2-7b-chat-hf meta-llama/Llama-2-13b-hf meta-llama/Llama-2-13b-chat-hf	`meta-llama/Llama-2-7b-hf` `meta-llama/Llama-2-7b-chat-hf` `meta-llama/Llama-2-13b-hf` `meta-llama/Llama-2-13b-chat-hf`
	llama-3	meta	`meta-llama/Meta-Llama-3-8B` `meta-llama/Meta-Llama-3-8B-Instruct`	`meta-llama/Meta-Llama-3-8B` `meta-llama/Meta-Llama-3-8B-Instruct`
	llama-3.1	meta	`meta-llama/Meta-Llama-3.1-8B` `meta-llama/Meta-Llama-3.1-8B-Instruct`	`meta-llama/Meta-Llama-3.1-8B` `meta-llama/Meta-Llama-3.1-8B-Instruct`
	llama-3.2	meta	`meta-llama/Llama-3.2-1B` `meta-llama/Llama-3.2-1B-Instruct` `meta-llama/Llama-3.2-3B` `meta-llama/Llama-3.2-3B-Instruct`	`meta-llama/Llama-3.2-1B` `meta-llama/Llama-3.2-1B-Instruct` `meta-llama/Llama-3.2-3B` `meta-llama/Llama-3.2-3B-Instruct`
	llama-3.2-vision	meta	`meta-llama/Llama-3.2-11B-Vision` `meta-llama/Llama-3.2-11B-Vision-Instruct`	`meta-llama/Llama-3.2-11B-Vision` `meta-llama/Llama-3.2-11B-Vision-Instruct`
llama-series	Chinese-LLaMA-Alpaca	HFL	`hfl/chinese-alpaca-plus-lora-7b` `hfl/chinese-llama-plus-lora-7b` (使用前需要合并lora权重)	`hfl/chinese-alpaca-plus-7b` `hfl/chinese-llama-plus-7b`
	Chinese-LLaMA-Alpaca-2	HFL		待添加
	Chinese-LLaMA-Alpaca-3	HFL		待添加
	Belle_llama	LianjiaTech	BelleGroup/BELLE-LLaMA-7B-2M-enc	合成说明、`BelleGroup/BELLE-LLaMA-7B-2M-enc`
	Ziya	IDEA-CCNL	IDEA-CCNL/Ziya-LLaMA-13B-v1 IDEA-CCNL/Ziya-LLaMA-13B-v1.1 IDEA-CCNL/Ziya-LLaMA-13B-Pretrain-v1	`IDEA-CCNL/Ziya-LLaMA-13B-v1` `IDEA-CCNL/Ziya-LLaMA-13B-v1.1`
	vicuna	lmsys	`lmsys/vicuna-7b-v1.5`	`lmsys/vicuna-7b-v1.5`
Baichuan	Baichuan	baichuan-inc	`baichuan-inc/Baichuan-7B` `baichuan-inc/Baichuan-13B-Base` `baichuan-inc/Baichuan-13B-Chat`	`baichuan-inc/Baichuan-7B` `baichuan-inc/Baichuan-13B-Base` `baichuan-inc/Baichuan-13B-Chat`
	Baichuan2	baichuan-inc	`baichuan-inc/Baichuan2-7B-Base` `baichuan-inc/Baichuan2-7B-Chat` `baichuan-inc/Baichuan2-13B-Base` `baichuan-inc/Baichuan2-13B-Chat`	`baichuan-inc/Baichuan2-7B-Base` `baichuan-inc/Baichuan2-7B-Chat` `baichuan-inc/Baichuan2-13B-Base` `baichuan-inc/Baichuan2-13B-Chat`
Yi	Yi	01-ai	`01-ai/Yi-6B` `01-ai/Yi-6B-200K` `01-ai/Yi-9B` `01-ai/Yi-9B-200K`	`01-ai/Yi-6B` `01-ai/Yi-6B-200K` `01-ai/Yi-9B` `01-ai/Yi-9B-200K`
	Yi-1.5	01-ai	`01-ai/Yi-1.5-6B` `01-ai/Yi-1.5-6B-Chat` `01-ai/Yi-1.5-9B` `01-ai/Yi-1.5-9B-32K` `01-ai/Yi-1.5-9B-Chat` `01-ai/Yi-1.5-9B-Chat-16K`	`01-ai/Yi-1.5-6B` `01-ai/Yi-1.5-6B-Chat` `01-ai/Yi-1.5-9B` `01-ai/Yi-1.5-9B-32K` `01-ai/Yi-1.5-9B-Chat` `01-ai/Yi-1.5-9B-Chat-16K`
bloom	bloom	bigscience	`bigscience/bloom-560m` `bigscience/bloomz-560m`	`bigscience/bloom-560m` `bigscience/bloomz-560m`
Qwen	Qwen	阿里云	`Qwen/Qwen-1_8B` `Qwen/Qwen-1_8B-Chat` `Qwen/Qwen-7B` `Qwen/Qwen-7B-Chat` `Qwen/Qwen-14B` `Qwen/Qwen-14B-Chat`	`Qwen/Qwen-1_8B` `Qwen/Qwen-1_8B-Chat` `Qwen/Qwen-7B` `Qwen/Qwen-7B-Chat` `Qwen/Qwen-14B` `Qwen/Qwen-14B-Chat`
	Qwen1.5	阿里云	`Qwen/Qwen1.5-0.5B` `Qwen/Qwen1.5-0.5B-Chat` `Qwen/Qwen1.5-1.8B` `Qwen/Qwen1.5-1.8B-Chat` `Qwen/Qwen1.5-7B` `Qwen/Qwen1.5-7B-Chat` `Qwen/Qwen1.5-14B` `Qwen/Qwen1.5-14B-Chat`	`Qwen/Qwen1.5-0.5B` `Qwen/Qwen1.5-0.5B-Chat` `Qwen/Qwen1.5-1.8B` `Qwen/Qwen1.5-1.8B-Chat` `Qwen/Qwen1.5-7B` `Qwen/Qwen1.5-7B-Chat` `Qwen/Qwen1.5-14B` `Qwen/Qwen1.5-14B-Chat`
	Qwen2	阿里云	`Qwen/Qwen2-0.5B` `Qwen/Qwen2-0.5B-Instruct` `Qwen/Qwen2-1.5B` `Qwen/Qwen2-1.5B-Instruct` `Qwen/Qwen2-7B` `Qwen/Qwen2-7B-Instruct`	`Qwen/Qwen2-0.5B` `Qwen/Qwen2-0.5B-Instruct` `Qwen/Qwen2-1.5B` `Qwen/Qwen2-1.5B-Instruct` `Qwen/Qwen2-7B` `Qwen/Qwen2-7B-Instruct`
	Qwen2-VL	阿里云	`Qwen/Qwen2-VL-2B-Instruct` `Qwen/Qwen2-VL-7B-Instruct`	`Qwen/Qwen2-VL-2B-Instruct` `Qwen/Qwen2-VL-7B-Instruct`
	Qwen2.5	阿里云	`Qwen/Qwen2.5-0.5B` `Qwen/Qwen2.5-0.5B-Instruct` `Qwen/Qwen2.5-1.5B` `Qwen/Qwen2.5-1.5B-Instruct` `Qwen/Qwen2.5-3B` `Qwen/Qwen2.5-3B-Instruct` `Qwen/Qwen2.5-7B` `Qwen/Qwen2.5-7B-Instruct` `Qwen/Qwen2.5-14B` `Qwen/Qwen2.5-14B-Instruct`	`Qwen/Qwen2.5-0.5B` `Qwen/Qwen2.5-0.5B-Instruct` `Qwen/Qwen2.5-1.5B` `Qwen/Qwen2.5-1.5B-Instruct` `Qwen/Qwen2.5-3B` `Qwen/Qwen2.5-3B-Instruct` `Qwen/Qwen2.5-7B` `Qwen/Qwen2.5-7B-Instruct` `Qwen/Qwen2.5-14B` `Qwen/Qwen2.5-14B-Instruct`
	Qwen2.5-VL	阿里云	`Qwen/Qwen2.5-VL-3B-Instruct` `Qwen/Qwen2.5-VL-7B-Instruct`	`Qwen/Qwen2.5-VL-3B-Instruct` `Qwen/Qwen2.5-VL-7B-Instruct`
InternLM	InternLM	上海人工智能实验室	`internlm/internlm-7b` `internlm/internlm-chat-7b`	`internlm/internlm-7b` `internlm/internlm-chat-7b`
	InternLM2	上海人工智能实验室	`internlm/internlm2-1_8b` `internlm/internlm2-chat-1_8b` `internlm/internlm2-7b` `internlm/internlm2-chat-7b` `internlm/internlm2-20b` `internlm/internlm2-chat-20b`	`internlm/internlm2-1_8b` `internlm/internlm2-chat-1_8b` `internlm/internlm2-7b` `internlm/internlm2-chat-7b`
	InternLM2.5	上海人工智能实验室	`internlm/internlm2_5-7b` `internlm/internlm2_5-7b-chat` `internlm/internlm2_5-7b-chat-1m`	`internlm/internlm2_5-7b` `internlm/internlm2_5-7b-chat` `internlm/internlm2_5-7b-chat-1m`
	InternLM3	上海人工智能实验室	`internlm/internlm3-8b-instruct`	`internlm/internlm3-8b-instruct`
InternVL	InternVL 1.0-1.5	上海人工智能实验室	`OpenGVLab/Mini-InternVL-Chat-4B-V1-5` `OpenGVLab/Mini-InternVL-Chat-2B-V1-5`	待添加
	InternVL 2.0	上海人工智能实验室	`OpenGVLab/InternVL2-1B` `OpenGVLab/InternVL2-2B` `OpenGVLab/InternVL2-4B` `OpenGVLab/InternVL2-8B`	待添加
	InternVL 2.5	上海人工智能实验室	`OpenGVLab/InternVL2_5-1B` `OpenGVLab/InternVL2_5-2B` `OpenGVLab/InternVL2_5-4B` `OpenGVLab/InternVL2_5-8B`	`OpenGVLab/InternVL2_5-1B` 待添加待添加待添加
Falcon	Falcon	tiiuae	`tiiuae/falcon-rw-1b` `tiiuae/falcon-7b` `tiiuae/falcon-7b-instruct`	`tiiuae/falcon-rw-1b` `tiiuae/falcon-7b` `tiiuae/falcon-7b-instruct`
DeepSeek	DeepSeek-MoE	深度求索	`deepseek-ai/deepseek-moe-16b-base` `deepseek-ai/deepseek-moe-16b-chat`	`deepseek-ai/deepseek-moe-16b-base` `deepseek-ai/deepseek-moe-16b-chat`
	DeepSeek-LLM	深度求索	`deepseek-ai/deepseek-llm-7b-base` `deepseek-ai/deepseek-llm-7b-chat`	`deepseek-ai/deepseek-llm-7b-base` `deepseek-ai/deepseek-llm-7b-chat`
	DeepSeek-V2	深度求索	`deepseek-ai/DeepSeek-V2-Lite` `deepseek-ai/DeepSeek-V2-Lite-Chat`	`deepseek-ai/DeepSeek-V2-Lite` `deepseek-ai/DeepSeek-V2-Lite-Chat`
	DeepSeek-Coder	深度求索	`deepseek-ai/deepseek-coder-1.3b-base` `deepseek-ai/deepseek-coder-1.3b-instruct` `deepseek-ai/deepseek-coder-6.7b-base` `deepseek-ai/deepseek-coder-6.7b-instruct` `deepseek-ai/deepseek-coder-7b-base-v1.5` `deepseek-ai/deepseek-coder-7b-instruct-v1.5`	`deepseek-ai/deepseek-coder-1.3b-base` `deepseek-ai/deepseek-coder-1.3b-instruct` `deepseek-ai/deepseek-coder-6.7b-base` `deepseek-ai/deepseek-coder-6.7b-instruct` `deepseek-ai/deepseek-coder-7b-base-v1.5` `deepseek-ai/deepseek-coder-7b-instruct-v1.5`
	DeepSeek-Coder-V2	深度求索	`deepseek-ai/DeepSeek-Coder-V2-Lite-Base` `deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct`	`deepseek-ai/DeepSeek-Coder-V2-Lite-Base` `deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct`
	DeepSeek-Math	深度求索	`deepseek-ai/deepseek-math-7b-base` `deepseek-ai/deepseek-math-7b-instruct` `deepseek-ai/deepseek-math-7b-rl`	`deepseek-ai/deepseek-math-7b-base` `deepseek-ai/deepseek-math-7b-instruct` `deepseek-ai/deepseek-math-7b-rl`
	DeepSeek-R1	深度求索	`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` `deepseek-ai/DeepSeek-R1-Distill-Qwen-14B` `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`	`deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B` `deepseek-ai/DeepSeek-R1-Distill-Qwen-7B` `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` `deepseek-ai/DeepSeek-R1-Distill-Qwen-14B` `deepseek-ai/DeepSeek-R1-Distill-Qwen-32B`
MiniCPM	MiniCPM	OpenBMB	`openbmb/MiniCPM-2B-sft-bf16` `openbmb/MiniCPM-2B-dpo-bf16` `openbmb/MiniCPM-2B-128k` `openbmb/MiniCPM-1B-sft-bf16`	`openbmb/MiniCPM-2B-sft-bf16` `openbmb/MiniCPM-2B-dpo-bf16` `openbmb/MiniCPM-2B-128k` `openbmb/MiniCPM-1B-sft-bf16`
	MiniCPM-o	OpenBMB	`openbmb/MiniCPM-Llama3-V-2_5` `openbmb/MiniCPM-V-2_6` `openbmb/MiniCPM-o-2_6`	`openbmb/MiniCPM-Llama3-V-2_5` `openbmb/MiniCPM-V-2_6` 待添加
embedding	text2vec-base-chinese	shibing624	`shibing624/text2vec-base-chinese`	`shibing624/text2vec-base-chinese`
	m3e	moka-ai	`moka-ai/m3e-base`	`moka-ai/m3e-base`
	bge	BAAI	`BAAI/bge-large-en-v1.5` `BAAI/bge-large-zh-v1.5` `BAAI/bge-base-en-v1.5` `BAAI/bge-base-zh-v1.5` `BAAI/bge-small-en-v1.5` `BAAI/bge-small-zh-v1.5`	`BAAI/bge-large-en-v1.5` `BAAI/bge-large-zh-v1.5` `BAAI/bge-base-en-v1.5` `BAAI/bge-base-zh-v1.5` `BAAI/bge-small-en-v1.5` `BAAI/bge-small-zh-v1.5`
	gte	thenlper	`thenlper/gte-large-zh` `thenlper/gte-base-zh`	`thenlper/gte-base-zh` `thenlper/gte-large-zh`