
llm_model_hub
None
Stars: 55

Model Hub V2 is a one-stop platform for model fine-tuning, deployment, and debugging without code, providing users with a visual interface to quickly validate the effects of fine-tuning various open-source models, facilitating rapid experimentation and decision-making, and lowering the threshold for users to fine-tune large models. For detailed instructions, please refer to the Feishu documentation.
README:
Model Hub V2是提供一站式的模型微调,部署,调试的无代码可视化平台,可以帮助用户快速验证微调各类开源模型的效果,方便用户快速实验和决策,降低用户微调大模型的门槛。详情请见飞书使用说明
- 进入CloudFormation创建一个stack,选择上传部署文件cloudformation-template.yaml
- 必填项包括,stack名,例如modelhub,选择一个ec2 key pairs (如果没有请提前去ec2控制台创建一个)
- 其他根据情况选填,也可以后期在backend/.env中添加,添加完成之后用
pm2 restart all
命令重启生效 - 一直下一步,直到勾选确认框,然后提交
- 配置完成后,等待stack创建完成,从Stack output栏找到PublicIP地址,然后访问http://{ip}:3000访问modelhub,默认用户名demo_user
- 密码获取:进入AWS System Manager->Parameter Store服务控制台,可以看到多了一个/modelhub/RandomPassword,进入之后打开Show decrypted value开关,获取登陆密码,默认用户名是
⚠️ 注意,stack显示部署完成之后,启动的EC2还需要8-10分钟自动运行一些脚本,如果不行,请等待8-10分钟,然后刷新页面
- 中国区由于无法直接用github,所以部署分为两步,首先使用cloudformation创建ec2服务器和role,然后下载项目代码上传至ec2服务器,运行一键安装脚本。
- 首先,进入CloudFormation创建一个stack,选择上传部署文件cloudformation-template-cn.yaml
- 填入一个stack名,例如modelhub,选择一个keypair文件用于ssh到ec2实例(如果没有已有的,则需要到ec2控制台事先创建好)
- 大约几分钟部署完成,在output页卡找到ec2 PublicIP和SageMakerRoleArn信息,后面会用到
- 请先在能访问github的电脑环境中执行以下命令下载代码,然后把代码打包成zip文件,上传到ec2服务器的/home/ubuntu/下。
⚠️ 注意:需要使用--recurse-submodule下载代码
git clone --recurse-submodule https://github.com/aws-samples/llm_model_hub.git
- zip项目并上传至第一步创建的ec2服务器的/home/ubuntu目录下
zip -r llm_model_hub.zip llm_model_hub/
scp -i <path_to_your_keypair_file> llm_model_hub.zip ubuntu@<第一步中获取的PublicIP>:~/
- 在本地终端中通过ssh到第一步创建的ec2, 注意ssh时添加-o ServerAliveInterval=60命令,防止安装脚本时,终端超时断开
ssh -i <path_to_your_keypair_file> -o ServerAliveInterval=60 ubuntu@<第一步中获取的PublicIP>:~/
- 登陆到ec2之后,解压上传的zip包
sudo apt update
sudo apt install unzip
unzip llm_model_hub.zip
- 设置环境变量
export SageMakerRoleArn=<第一步cloudformation output里SageMakerRoleArn信息,如 arn:aws-cn:iam:1234567890:role/sagemaker_exection_role>
- (可选)如需要设置Swanlab或者wandb作为metrics监控看板,也可以后期在backend/.env中添加,添加之后运行pm2 restart all重启服务
export SWANLAB_API_KEY=<SWANLAB_API_KEY>
export WANDB_API_KEY=<WANDB_API_KEY>
export WANDB_BASE_URL=<WANDB_BASE_URL>
- 执行一键部署脚本
cd /home/ubuntu/llm_model_hub
bash cn-region-deploy.sh
大约40~60分钟(取决于docker镜像网站速度)之后执行完成,可以在/home/ubuntu/setup.log中查看安装日志。
- 访问
- 以上都部署完成后,前端启动之后,可以通过浏览器访问http://{ec2 PublicIP}:3000访问前端,/home/ubuntu/setup.log中查看用户名和随机密码
- 如果需要做端口转发,则参考后端配置中的nginx配置部分
- 硬件需求:一台ec2 Instance, m5.xlarge, 200GB EBS storage
- os需求:ubuntu 22.04
- 配置权限:
- 在IAM中创建一个ec2 role :adminrole-for-ec2.
- select trust type: AWS service, service: EC2,
- 添加以下2个服务的权限,AmazonSageMakerFullAccess, CloudWatchLogsFullAccess
- 把ec2 instance attach到role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket",
"s3:CreateBucket"
],
"Resource": [
"*"
]
},
{
"Effect": "Allow",
"Action": [
"ssmmessages:CreateControlChannel"
],
"Resource": [
"*"
]
}
]
}
- ssh 到ec2 instance
- 如果是中国区需要手动下载代码并打包传到ec2中
- 请先在能访问github的环境中执行以下命令下载代码,然后把代码打包成zip文件,上传到ec2服务器的/home/ubuntu/下。
- 使用--recurse-submodule下载代码
git clone --recurse-submodule https://github.com/aws-samples/llm_model_hub.git
unzip llm_model_hub.zip
export SageMakerRoleArn=<上面步骤创建的sagemaker_exection_role的完整arn,如 arn:aws-cn:iam:1234567890:role/sagemaker_exection_role>
- (可选)如需要设置Swanlab或者wandb作为metrics监控看板,也可以后期在backend/.env中添加,添加之后运行pm2 restart all重启服务
export SWANLAB_API_KEY=<SWANLAB_API_KEY>
export WANDB_API_KEY=<WANDB_API_KEY>
export WANDB_BASE_URL=<WANDB_BASE_URL>
cd /home/ubuntu/llm_model_hub
bash cn-region-deploy.sh
大约40~60分钟(取决于docker镜像网站速度)之后执行完成,可以在/home/ubuntu/setup.log中查看安装日志。
- 以上都部署完成后,前端启动之后,可以通过浏览器访问http://{ip}:3000访问前端,/home/ubuntu/setup.log中查看用户名和随机密码
- 如果需要做端口转发,则参考后端配置中的nginx配置部分
- 方法 1. 下载新的cloudformation 模板进行重新部署,大约12分钟部署完成一个全新的modelhub (此方法以前的job 任务数据会丢失)
- 方法 2.
- 使用一键升级脚本(1.0.6之后支持):
cd /home/ubuntu/llm_model_hub/backend/
bash 03.upgrade.sh
- 方法 3. 手动更新:
- 更新代码, 重新打包byoc镜像
git pull
git submodule update --remote
cd /home/ubuntu/llm_model_hub/backend/byoc
bash build_and_push.sh
- 重启服务
pm2 restart all
- 更新完成
docker exec -it hub-mysql mysql -ullmdata -pllmdata
进入mysql cli之后:
use llm;
show tables;
select * from USER_TABLE;
本软件按"原样"提供,不提供任何形式的明示或暗示保证,包括但不限于对适销性、特定用途适用性和非侵权性的保证。在任何情况下,无论是在合同诉讼、侵权行为或其他方面,作者或版权持有人均不对任何索赔、损害或其他责任负责,这些索赔、损害或其他责任源于本软件或与本软件的使用或其他交易有关。
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for llm_model_hub
Similar Open Source Tools

llm_model_hub
Model Hub V2 is a one-stop platform for model fine-tuning, deployment, and debugging without code, providing users with a visual interface to quickly validate the effects of fine-tuning various open-source models, facilitating rapid experimentation and decision-making, and lowering the threshold for users to fine-tune large models. For detailed instructions, please refer to the Feishu documentation.

EduChat
EduChat is a large-scale language model-based chatbot system designed for intelligent education by the EduNLP team at East China Normal University. The project focuses on developing a dialogue-based language model for the education vertical domain, integrating diverse education vertical domain data, and providing functions such as automatic question generation, homework correction, emotional support, course guidance, and college entrance examination consultation. The tool aims to serve teachers, students, and parents to achieve personalized, fair, and warm intelligent education.

Y2A-Auto
Y2A-Auto is an automation tool that transfers YouTube videos to AcFun. It automates the entire process from downloading, translating subtitles, content moderation, intelligent tagging, to partition recommendation and upload. It also includes a web management interface and YouTube monitoring feature. The tool supports features such as downloading videos and covers using yt-dlp, AI translation and embedding of subtitles, AI generation of titles/descriptions/tags, content moderation using Aliyun Green, uploading to AcFun, task management, manual review, and forced upload. It also offers settings for automatic mode, concurrency, proxies, subtitles, login protection, brute force lock, YouTube monitoring, channel/trend capturing, scheduled tasks, history records, optional GPU/hardware acceleration, and Docker deployment or local execution.

DeepMCPAgent
DeepMCPAgent is a model-agnostic tool that enables the creation of LangChain/LangGraph agents powered by MCP tools over HTTP/SSE. It allows for dynamic discovery of tools, connection to remote MCP servers, and integration with any LangChain chat model instance. The tool provides a deep agent loop for enhanced functionality and supports typed tool arguments for validated calls. DeepMCPAgent emphasizes the importance of MCP-first approach, where agents dynamically discover and call tools rather than hardcoding them.

aiges
AIGES is a core component of the Athena Serving Framework, designed as a universal encapsulation tool for AI developers to deploy AI algorithm models and engines quickly. By integrating AIGES, you can deploy AI algorithm models and engines rapidly and host them on the Athena Serving Framework, utilizing supporting auxiliary systems for networking, distribution strategies, data processing, etc. The Athena Serving Framework aims to accelerate the cloud service of AI algorithm models and engines, providing multiple guarantees for cloud service stability through cloud-native architecture. You can efficiently and securely deploy, upgrade, scale, operate, and monitor models and engines without focusing on underlying infrastructure and service-related development, governance, and operations.

Avalonia-Assistant
Avalonia-Assistant is an open-source desktop intelligent assistant that aims to provide a user-friendly interactive experience based on the Avalonia UI framework and the integration of Semantic Kernel with OpenAI or other large LLM models. By utilizing Avalonia-Assistant, you can perform various desktop operations through text or voice commands, enhancing your productivity and daily office experience.

herc.ai
Herc.ai is a powerful library for interacting with the Herc.ai API. It offers free access to users and supports all languages. Users can benefit from Herc.ai's features unlimitedly with a one-time subscription and API key. The tool provides functionalities for question answering and text-to-image generation, with support for various models and customization options. Herc.ai can be easily integrated into CLI, CommonJS, TypeScript, and supports beta models for advanced usage. Developed by FiveSoBes and Luppux Development.

chatgpt-webui
ChatGPT WebUI is a user-friendly web graphical interface for various LLMs like ChatGPT, providing simplified features such as core ChatGPT conversation and document retrieval dialogues. It has been optimized for better RAG retrieval accuracy and supports various search engines. Users can deploy local language models easily and interact with different LLMs like GPT-4, Azure OpenAI, and more. The tool offers powerful functionalities like GPT4 API configuration, system prompt setup for role-playing, and basic conversation features. It also provides a history of conversations, customization options, and a seamless user experience with themes, dark mode, and PWA installation support.

metaso-free-api
Metaso AI Free service supports high-speed streaming output, secret tower AI super network search (full network or academic as well as concise, in-depth, research three modes), zero-configuration deployment, multi-token support. Fully compatible with ChatGPT interface. It also has seven other free APIs available for use. The tool provides various deployment options such as Docker, Docker-compose, Render, Vercel, and native deployment. Users can access the tool for chat completions and token live checks. Note: Reverse API is unstable, it is recommended to use the official Metaso AI website to avoid the risk of banning. This project is for research and learning purposes only, not for commercial use.

CareGPT
CareGPT is a medical large language model (LLM) that explores medical data, training, and deployment related research work. It integrates resources, open-source models, rich data, and efficient deployment methods. It supports various medical tasks, including patient diagnosis, medical dialogue, and medical knowledge integration. The model has been fine-tuned on diverse medical datasets to enhance its performance in the healthcare domain.

meet-libai
The 'meet-libai' project aims to promote and popularize the cultural heritage of the Chinese poet Li Bai by constructing a knowledge graph of Li Bai and training a professional AI intelligent body using large models. The project includes features such as data preprocessing, knowledge graph construction, question-answering system development, and visualization exploration of the graph structure. It also provides code implementations for large models and RAG retrieval enhancement.

ERNIE-SDK
ERNIE SDK repository contains two projects: ERNIE Bot Agent and ERNIE Bot. ERNIE Bot Agent is a large model intelligent agent development framework based on the Wenxin large model orchestration capability introduced by Baidu PaddlePaddle, combined with the rich preset platform functions of the PaddlePaddle Star River community. ERNIE Bot provides developers with convenient interfaces to easily call the Wenxin large model for text creation, general conversation, semantic vectors, and AI drawing basic functions.

ChatPilot
ChatPilot is a chat agent tool that enables AgentChat conversations, supports Google search, URL conversation (RAG), and code interpreter functionality, replicates Kimi Chat (file, drag and drop; URL, send out), and supports OpenAI/Azure API. It is based on LangChain and implements ReAct and OpenAI Function Call for agent Q&A dialogue. The tool supports various automatic tools such as online search using Google Search API, URL parsing tool, Python code interpreter, and enhanced RAG file Q&A with query rewriting support. It also allows front-end and back-end service separation using Svelte and FastAPI, respectively. Additionally, it supports voice input/output, image generation, user management, permission control, and chat record import/export.

cool-admin-midway
Cool-admin (midway version) is a cool open-source backend permission management system that supports modular, plugin-based, rapid CRUD development. It facilitates the quick construction and iteration of backend management systems, deployable in various ways such as serverless, docker, and traditional servers. It features AI coding for generating APIs and frontend pages, flow orchestration for drag-and-drop functionality, modular and plugin-based design for clear and maintainable code. The tech stack includes Node.js, Midway.js, Koa.js, TypeScript for backend, and Vue.js, Element-Plus, JSX, Pinia, Vue Router for frontend. It offers friendly technology choices for both frontend and backend developers, with TypeScript syntax similar to Java and PHP for backend developers. The tool is suitable for those looking for a modern, efficient, and fast development experience.

gzm-design
Gzm Design is a free and open-source poster designer developed using the latest mainstream technologies such as Vue3, Vite4, TypeScript, etc. It provides features like PSD import, JSON import, multiple pages support, shortcut key support, template import, layer management, ruler tool, pen tool, element editing, preview, file download, canvas zooming and dragging, border stroke, filling, blending modes, text formatting, group handling, canvas size modification, rich text support, masking, shadow effects, undo/redo functionality, QR code tool, barcode tool, and ruler line npm package encapsulation.

openai-forward
OpenAI-Forward is an efficient forwarding service implemented for large language models. Its core features include user request rate control, token rate limiting, intelligent prediction caching, log management, and API key management, aiming to provide efficient and convenient model forwarding services. Whether proxying local language models or cloud-based language models like LocalAI or OpenAI, OpenAI-Forward makes it easy. Thanks to support from libraries like uvicorn, aiohttp, and asyncio, OpenAI-Forward achieves excellent asynchronous performance.
For similar tasks

vertex-ai-samples
The Google Cloud Vertex AI sample repository contains notebooks and community content that demonstrate how to develop and manage ML workflows using Google Cloud Vertex AI.

byteir
The ByteIR Project is a ByteDance model compilation solution. ByteIR includes compiler, runtime, and frontends, and provides an end-to-end model compilation solution. Although all ByteIR components (compiler/runtime/frontends) are together to provide an end-to-end solution, and all under the same umbrella of this repository, each component technically can perform independently. The name, ByteIR, comes from a legacy purpose internally. The ByteIR project is NOT an IR spec definition project. Instead, in most scenarios, ByteIR directly uses several upstream MLIR dialects and Google Mhlo. Most of ByteIR compiler passes are compatible with the selected upstream MLIR dialects and Google Mhlo.

effort
Effort is an example implementation of the bucketMul algorithm, which allows for real-time adjustment of the number of calculations performed during inference of an LLM model. At 50% effort, it performs as fast as regular matrix multiplications on Apple Silicon chips; at 25% effort, it is twice as fast while still retaining most of the quality. Additionally, users have the option to skip loading the least important weights.

ort
Ort is an unofficial ONNX Runtime 1.17 wrapper for Rust based on the now inactive onnxruntime-rs. ONNX Runtime accelerates ML inference on both CPU and GPU.

llm_model_hub
Model Hub V2 is a one-stop platform for model fine-tuning, deployment, and debugging without code, providing users with a visual interface to quickly validate the effects of fine-tuning various open-source models, facilitating rapid experimentation and decision-making, and lowering the threshold for users to fine-tune large models. For detailed instructions, please refer to the Feishu documentation.

ai-on-gke
This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources

ray
Ray is a unified framework for scaling AI and Python applications. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including Data, Train, Tune, RLlib, and Serve. Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations. With Ray, you can seamlessly scale the same code from a laptop to a cluster, making it easy to meet the compute-intensive demands of modern ML workloads.

labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.
For similar jobs

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.