ai-hub
AI Hub 是一个为了接入包括ChatGPT、Baichuan、Zhipu、混元、MiniMax、Moonshot等多种大型语言模型而设计的服务。它旨在积累和管理各种有效的模型调用提示(prompt),并对这些大型语言模型进行持续的测试和评估。
Stars: 54
AI Hub Project aims to continuously test and evaluate mainstream large language models, while accumulating and managing various effective model invocation prompts. It has integrated all mainstream large language models in China, including OpenAI GPT-4 Turbo, Baidu ERNIE-Bot-4, Tencent ChatPro, MiniMax abab5.5-chat, and more. The project plans to continuously track, integrate, and evaluate new models. Users can access the models through REST services or Java code integration. The project also provides a testing suite for translation, coding, and benchmark testing.
README:
AI Hub旨在持续测试和评估主流大型语言模型,同时积累和管理各种有效的模型调用提示(prompt)。目前,AI Hub已接入国内所有主流的大型语言模型,包括文心一言、腾讯混元、智谱AI、MiniMax、百川智能等,并计划持续追踪、接入和评估新模型。
已支持模型列表:
- OpenAI / gpt-4-turbo
- OpenAI / gpt-3.5-turbo
- Baidu / ERNIE-Bot-4(文心一言4)
- Baidu / ERNIE-Bot-turbo(文心一言)
- Zhipu / glm-4(智谱GLM-4)
- Zhipu / chatGLM_turbo(智谱chatGLM)
- Ali / qwen-plus(通义千问plus)
- Ali / qwen-turbo(通义千问)
- Tencent / ChatPro(腾讯混元)
- Tencent / ChatStd(腾讯混元)
- Tencent / hunyuan-lite(腾讯混元)
- Baichuan / Baichuan2-Turbo(百川)
- Minimax / abab5.5-chat(MiniMax)
- Minimax / abab6-chat(MiniMax)
- Xunfei / Spark3.1(讯飞星火)
- Moonshot / moonshot-v1-8k (月之暗面)
- Xunfei / Spark3.5 (讯飞星火3.5)
- ByteDance / Skylark-chat (字节豆包)
- Lingyi / yi-34b-chat-0205 (零一万物)
- Lingyi / yi-34b-chat-200k (零一万物)
- Lingyi / yi-vl-plus (零一万物)
- Deepseek / DeepSeek-V2 (Deepseek)
- Baidu / ERNIE-Lite-8K(文心一言)
- Baidu / ERNIE-Speed-8K(文心一言)
- Xunfei / Spark-Lite(讯飞星火)
在 大模型列表 部分,有更完整的大语言模型列表。请注意,其中的一些大语言模型尚未经过评估,我将陆续对这些模型进行评估。
使用前请在 Settings 页面设置模型的 credentials:
如果你想自己接入列表中的大模型,可以通过以下方式。
启动 ai-hub-server,访问
http://127.0.0.1:3000/api/v1/models/${provider}/${model}:chat
Post:
{
"input": "${input}"
}
可以参考这里
@Service
public class AIModelInvokerFactory {
private final ApplicationContext context;
@Autowired
public AIModelInvokerFactory(ApplicationContext context) {
this.context = context;
}
public AIModelInvoker getProviderAdapter(String providerName) {
AIProvider provider = AIProvider.fromName(providerName);
switch (provider) {
case OPENAI:
return context.getBean(OpenAIInvoker.class);
case BAICHUAN:
return context.getBean(BaichuanInvoker.class);
case ALI:
return context.getBean(AliInvoker.class);
case BAIDU:
return context.getBean(BaiduInvoker.class);
case ZHIPU:
return context.getBean(ZhipuInvoker.class);
case TENCENT:
return context.getBean(TencentInvoker.class);
case XUNFEI:
return context.getBean(XunfeiInvoker.class);
case MINIMAX:
return context.getBean(MiniMaxInvoker.class);
default:
throw new IllegalArgumentException("Unknown provider: " + provider);
}
}
}
推荐使用 docker-compose 启动服务
cd docker
docker-compose up -d
参考脚本
cd ai-hub-fe
npm run start
需要 JDK 11 以上版本
cd ai-hub-server
mvn clean package
java -jar ai-hub-server-1.0.0-SNAPSHOT-exec.jar
Company | Model | Price(1M tokens) | Context Length |
---|---|---|---|
Baidu | ERNIE Speed | 免费 | 8k |
Baidu | ERNIE Lite | 免费 | 8k |
Tencent | hunyuan-lite | 免费 | 256k |
ByteDance | Doubao-lite | Input: 0.3 | Output: 0.6 | 32k |
Zhipu | GLM-3-Turbo | 1 | 128k |
Lingyi | yi-spark | 1 | 16k |
Ali | qwen-long | Input: 0.5 | Output: 2 | 10m |
ByteDance | Doubao-pro | Input: 0.8 | Output: 2 | 32k |
DeepSeek | deepseek-chat | Input: 1 | Output: 2 | 32k |
Lingyi | yi-medium | 2.5 | 16k |
Company | Model | Price(1M tokens) | Context Length |
---|---|---|---|
Ali | qwen-turbo | Input: 2 | Output: 6 | 8k |
Tencent | hunyuan-standard | Input: 4.5 | Output: 5 | 32k |
MiniMax | abab5.5s | 5 | 8k |
OpenAI | GPT-3.5 Turbo | Input: $0.50 | Output: $1.50 | 16k |
ByteDance | Doubao-pro-128k | Input: 5 | Output: 9 | 128k |
Baichuan | Baichuan2-Turbo | 8 | 32k |
MiniMax | abab6.5s | 10 | 245k |
Ali | qwen-plus | Input: 4 | Output: 12 | 32k |
Baidu | ERNIE 3.0 | 12 | 8k |
Baichuan | Baichuan3-Turbo | 12 | 32k |
Lingyi | yi-large-turbo | 12 | 16k |
Lingyi | yi-medium-200k | 12 | 200k |
Moonshot | moonshot-v1-8k | 12 | 8k |
Company | Model | Price(1M tokens) | Context Length |
---|---|---|---|
Moonshot | moonshot-v1-32k | 24 | 32k |
Baichuan | Baichuan3-Turbo-128k | 24 | 128k |
MiniMax | abab6.5 | 30 | 8k |
Tencent | hunyuan-standard-256k | Input: 15 | Output: 60 | 256k |
Moonshot | moonshot-v1-128k | 60 | 128k |
Company | Model | Price(1M tokens) | Context Length |
---|---|---|---|
OpenAI | GPT-4o | Input: $5 | Output: $15 | 128k |
Baidu | ERNIE-3.5-128k | Input: 48 | Output: 96 | 128k |
Tencent | hunyuan-pro | Input: 30 | Output: 100 | 32k |
Ali | qwen-max | Input: 40 | Output: 120 | 8k |
Zhipu | GLM-4 | 100 | 128k |
Baichuan | Baichuan4 | 100 | 32k |
Baidu | ERNIE 4.0 | 120 | 8k |
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for ai-hub
Similar Open Source Tools
ai-hub
AI Hub Project aims to continuously test and evaluate mainstream large language models, while accumulating and managing various effective model invocation prompts. It has integrated all mainstream large language models in China, including OpenAI GPT-4 Turbo, Baidu ERNIE-Bot-4, Tencent ChatPro, MiniMax abab5.5-chat, and more. The project plans to continuously track, integrate, and evaluate new models. Users can access the models through REST services or Java code integration. The project also provides a testing suite for translation, coding, and benchmark testing.
tt-metal
TT-NN is a python & C++ Neural Network OP library. It provides a low-level programming model, TT-Metalium, enabling kernel development for Tenstorrent hardware.
go-cyber
Cyber is a superintelligence protocol that aims to create a decentralized and censorship-resistant internet. It uses a novel consensus mechanism called CometBFT and a knowledge graph to store and process information. Cyber is designed to be scalable, secure, and efficient, and it has the potential to revolutionize the way we interact with the internet.
pmhub
PmHub is a smart project management system based on SpringCloud, SpringCloud Alibaba, and LLM. It aims to help students quickly grasp the architecture design and development process of microservices/distributed projects. PmHub provides a platform for students to experience the transformation from monolithic to microservices architecture, understand the pros and cons of both architectures, and prepare for job interviews. It offers popular technologies like SpringCloud-Gateway, Nacos, Sentinel, and provides high-quality code, continuous integration, product design documents, and an enterprise workflow system. PmHub is suitable for beginners and advanced learners who want to master core knowledge of microservices/distributed projects.
yudao-boot-mini
yudao-boot-mini is an open-source project focused on developing a rapid development platform for developers in China. It includes features like system functions, infrastructure, member center, data reports, workflow, mall system, WeChat official account, CRM, ERP, etc. The project is based on Spring Boot with Java backend and Vue for frontend. It offers various functionalities such as user management, role management, menu management, department management, workflow management, payment system, code generation, API documentation, database documentation, file service, WebSocket integration, message queue, Java monitoring, and more. The project is licensed under the MIT License, allowing both individuals and enterprises to use it freely without restrictions.
Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, benchmarks, demos, papers for Large Language Models (like ChatGPT, LLaMA, GLM, Baichuan, etc) Evaluation on Language capabilities, Knowledge, Reasoning, Fairness and Safety.
ruoyi-vue-pro
The ruoyi-vue-pro repository is an open-source project that provides a comprehensive development platform with various functionalities such as system features, infrastructure, member center, data reports, workflow, payment system, mall system, ERP system, CRM system, and AI big model. It is built using Java backend with Spring Boot framework and Vue frontend with different versions like Vue3 with element-plus, Vue3 with vben(ant-design-vue), and Vue2 with element-ui. The project aims to offer a fast development platform for developers and enterprises, supporting features like dynamic menu loading, button-level access control, SaaS multi-tenancy, code generator, real-time communication, integration with third-party services like WeChat, Alipay, and cloud services, and more.
yudao-cloud
Yudao-cloud is an open-source project designed to provide a fast development platform for developers in China. It includes various system functions, infrastructure, member center, data reports, workflow, mall system, WeChat public account, CRM, ERP, etc. The project is based on Java backend with Spring Boot and Spring Cloud Alibaba microservices architecture. It supports multiple databases, message queues, authentication systems, dynamic menu loading, SaaS multi-tenant system, code generator, real-time communication, integration with third-party services like WeChat, Alipay, and more. The project is well-documented and follows the Alibaba Java development guidelines, ensuring clean code and architecture.
KeepChatGPT
KeepChatGPT is a plugin designed to enhance the data security capabilities and efficiency of ChatGPT. It aims to make your chat experience incredibly smooth, eliminating dozens or even hundreds of unnecessary steps, and permanently getting rid of various errors and warnings. It offers innovative features such as automatic refresh, activity maintenance, data security, audit cancellation, conversation cloning, endless conversations, page purification, large screen display, full screen display, tracking interception, rapid changes, and detailed insights. The plugin ensures that your AI experience is secure, smooth, efficient, concise, and seamless.
LLM-TPU
LLM-TPU project aims to deploy various open-source generative AI models on the BM1684X chip, with a focus on LLM. Models are converted to bmodel using TPU-MLIR compiler and deployed to PCIe or SoC environments using C++ code. The project has deployed various open-source models such as Baichuan2-7B, ChatGLM3-6B, CodeFuse-7B, DeepSeek-6.7B, Falcon-40B, Phi-3-mini-4k, Qwen-7B, Qwen-14B, Qwen-72B, Qwen1.5-0.5B, Qwen1.5-1.8B, Llama2-7B, Llama2-13B, LWM-Text-Chat, Mistral-7B-Instruct, Stable Diffusion, Stable Diffusion XL, WizardCoder-15B, Yi-6B-chat, Yi-34B-chat. Detailed model deployment information can be found in the 'models' subdirectory of the project. For demonstrations, users can follow the 'Quick Start' section. For inquiries about the chip, users can contact SOPHGO via the official website.
ailia-models
The collection of pre-trained, state-of-the-art AI models. ailia SDK is a self-contained, cross-platform, high-speed inference SDK for AI. The ailia SDK provides a consistent C++ API across Windows, Mac, Linux, iOS, Android, Jetson, and Raspberry Pi platforms. It also supports Unity (C#), Python, Rust, Flutter(Dart) and JNI for efficient AI implementation. The ailia SDK makes extensive use of the GPU through Vulkan and Metal to enable accelerated computing. # Supported models 323 models as of April 8th, 2024
adata
AData is a free and open-source A-share database that focuses on transaction-related data. It provides comprehensive data on stocks, including basic information, market data, and sentiment analysis. AData is designed to be easy to use and integrate with other applications, making it a valuable tool for quantitative trading and AI training.
aidea-server
AIdea Server is an open-source Golang-based server that integrates mainstream large language models and drawing models. It supports various functionalities including OpenAI's GPT-3.5 and GPT-4, Anthropic's Claude instant and Claude 2.1, Google's Gemini Pro, as well as Chinese models like Tongyi Qianwen, Wenxin Yiyuan, and more. It also supports open-source large models like Yi 34B, Llama2, and AquilaChat 7B. Additionally, it provides features for text-to-image, super-resolution, coloring black and white images, generating art fonts and QR codes, among others.
LLamaTuner
LLamaTuner is a repository for the Efficient Finetuning of Quantized LLMs project, focusing on building and sharing instruction-following Chinese baichuan-7b/LLaMA/Pythia/GLM model tuning methods. The project enables training on a single Nvidia RTX-2080TI and RTX-3090 for multi-round chatbot training. It utilizes bitsandbytes for quantization and is integrated with Huggingface's PEFT and transformers libraries. The repository supports various models, training approaches, and datasets for supervised fine-tuning, LoRA, QLoRA, and more. It also provides tools for data preprocessing and offers models in the Hugging Face model hub for inference and finetuning. The project is licensed under Apache 2.0 and acknowledges contributions from various open-source contributors.
gpt_server
The GPT Server project leverages the basic capabilities of FastChat to provide the capabilities of an openai server. It perfectly adapts more models, optimizes models with poor compatibility in FastChat, and supports loading vllm, LMDeploy, and hf in various ways. It also supports all sentence_transformers compatible semantic vector models, including Chat templates with function roles, Function Calling (Tools) capability, and multi-modal large models. The project aims to reduce the difficulty of model adaptation and project usage, making it easier to deploy the latest models with minimal code changes.
For similar tasks
arena-hard-auto
Arena-Hard-Auto-v0.1 is an automatic evaluation tool for instruction-tuned LLMs. It contains 500 challenging user queries. The tool prompts GPT-4-Turbo as a judge to compare models' responses against a baseline model (default: GPT-4-0314). Arena-Hard-Auto employs an automatic judge as a cheaper and faster approximator to human preference. It has the highest correlation and separability to Chatbot Arena among popular open-ended LLM benchmarks. Users can evaluate their models' performance on Chatbot Arena by using Arena-Hard-Auto.
max
The Modular Accelerated Xecution (MAX) platform is an integrated suite of AI libraries, tools, and technologies that unifies commonly fragmented AI deployment workflows. MAX accelerates time to market for the latest innovations by giving AI developers a single toolchain that unlocks full programmability, unparalleled performance, and seamless hardware portability.
ai-hub
AI Hub Project aims to continuously test and evaluate mainstream large language models, while accumulating and managing various effective model invocation prompts. It has integrated all mainstream large language models in China, including OpenAI GPT-4 Turbo, Baidu ERNIE-Bot-4, Tencent ChatPro, MiniMax abab5.5-chat, and more. The project plans to continuously track, integrate, and evaluate new models. Users can access the models through REST services or Java code integration. The project also provides a testing suite for translation, coding, and benchmark testing.
long-context-attention
Long-Context-Attention (YunChang) is a unified sequence parallel approach that combines the strengths of DeepSpeed-Ulysses-Attention and Ring-Attention to provide a versatile and high-performance solution for long context LLM model training and inference. It addresses the limitations of both methods by offering no limitation on the number of heads, compatibility with advanced parallel strategies, and enhanced performance benchmarks. The tool is verified in Megatron-LM and offers best practices for 4D parallelism, making it suitable for various attention mechanisms and parallel computing advancements.
marlin
Marlin is a highly optimized FP16xINT4 matmul kernel designed for large language model (LLM) inference, offering close to ideal speedups up to batchsizes of 16-32 tokens. It is suitable for larger-scale serving, speculative decoding, and advanced multi-inference schemes like CoT-Majority. Marlin achieves optimal performance by utilizing various techniques and optimizations to fully leverage GPU resources, ensuring efficient computation and memory management.
MMC
This repository, MMC, focuses on advancing multimodal chart understanding through large-scale instruction tuning. It introduces a dataset supporting various tasks and chart types, a benchmark for evaluating reasoning capabilities over charts, and an assistant achieving state-of-the-art performance on chart QA benchmarks. The repository provides data for chart-text alignment, benchmarking, and instruction tuning, along with existing datasets used in experiments. Additionally, it offers a Gradio demo for the MMCA model.
Tiktoken
Tiktoken is a high-performance implementation focused on token count operations. It provides various encodings like o200k_base, cl100k_base, r50k_base, p50k_base, and p50k_edit. Users can easily encode and decode text using the provided API. The repository also includes a benchmark console app for performance tracking. Contributions in the form of PRs are welcome.
ppl.llm.serving
ppl.llm.serving is a serving component for Large Language Models (LLMs) within the PPL.LLM system. It provides a server based on gRPC and supports inference for LLaMA. The repository includes instructions for prerequisites, quick start guide, model exporting, server setup, client usage, benchmarking, and offline inference. Users can refer to the LLaMA Guide for more details on using this serving component.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.