
LLMOne
Enterprise-grade LLM automated deployment tool that makes AI servers truly "plug-and-play".
Stars: 82

LLMOne is an open-source, lightweight enterprise-level platform for deploying and serving large language models. It aims to address pain points in traditional large model private deployment such as long cycles, complex configurations, performance challenges, and high operational costs. LLMOne simplifies the deployment process with highly automated workflows and optimized runtime environments, ensuring enterprise-level performance and stability. It caters to developers, manufacturers, and users of large language models, providing features like rapid deployment, professional inference performance, broad compatibility with AI hardware, flexible model and application management, visual operational monitoring, and an open application ecosystem.
README:
LLMOne 是一款开源、轻量的 企业级大语言模型部署与服务平台。它致力于解决传统大模型私有化部署中周期长、配置复杂、性能难以保障、运维成本高等痛点。
无论您是:
- 大语言模型应用(如 OpenWebUI, Dify, RAGFlow 等)的开发者或厂商:LLMOne 能帮助您将应用快速、标准化地部署到客户的私有化硬件中,未来通过 "Deploy on LLMOne" 功能,实现一键集成和交付。
- 大语言模型一体机厂商或系统集成商:LLMOne 提供从操作系统、驱动到模型服务的全栈自动化部署能力,大幅缩短交付周期,降低运维难度,提升硬件产品的附加价值,让您的硬件真正“开箱即服务”。
- 拥有高性能设备(如 NVIDIA DGX 级别服务器/工作站、Mac Studio/Mini 等)的技术极客或企业用户:LLMOne 帮助您在自有硬件上快速搭建高性能、高可靠性的大模型推理服务,免去繁琐的环境配置和优化工作。
通过高度自动化的部署流程和优化的运行环境,LLMOne 将复杂的部署过程简化为几次点击操作,确保企业级性能和稳定性,让您能更专注于大模型应用创新和价值实现。
![]() |
![]() |
---|---|
图2: 引导式系统配置,简化底层设置 | 图3: 一键部署大语言模型,加速业务上线 |
LLMOne 不仅支持大模型的便捷部署,更强调开放的应用生态集成。您可以轻松集成如 OpenWebUI 这样的开源交互界面,或通过 NexusGate 等工具对平台上的所有应用及硬件资源进行全面的监控和管理。
![]() |
![]() |
---|---|
图4: 开源LLM项目一键集成 (以 OpenWebUI 为例) | 图5: 应用与资源统一监控管理 |
部署完成后,LLMOne 提供详尽的部署报告和透明的性能测试结果,帮助您全面掌握系统状态和运行表现。
![]() |
![]() |
---|---|
图6: 详尽部署报告,过程与结果全透明 | 图7: 内置性能测试,直观评估模型表现 |
整个部署与应用配置过程都力求简洁高效,让用户能够快速上手并投入使用。
- 极速自动化部署: 全栈自动化,小时级完成从系统到模型的部署,真正实现“开箱即用”。
- 专业级推理性能: 集成 vLLM 等领先推理引擎,深度优化主流 AI 硬件,保障企业级性能与高可靠性。
- 广泛兼容适配: 支持主流开源模型与多样化AI硬件(含NVIDIA GPU、Ascend NPU、Apple Silicon 及其他国产芯片),选择灵活。
- 灵活的模型与应用管理: 支持便捷的模型切换、更新、多模型协同,及模块化部署应用组件(如 RAG),快速满足定制化场景需求。
- 可视化运维监控: 提供覆盖部署、监控到日志分析的可视化界面,简化运维。
- 开放应用生态 ("Deploy on LLMOne"): 赋能应用开发者快速集成部署(如Open WebUI、Dify、RAGFlow、ChatBI、LLaMa Factory 等),助力硬件厂商提升产品价值与交付效率。
您可以直接至 项目 Release 页面 下载对应平台的可执行文件进行使用。
目前已支持的操作系统:
[!TIP] 我们优先支持 Windows (x86) 和 Apple Silicon (macOS) 平台的部署工具的打包与测试。对其他硬件的支持正在积极规划中,如您有其他平台的支持需求,请在 Issues 中反馈。
要快速开始使用 LLMOne,请按照以下步骤操作:
-
准备网络环境:
- 确保连通性: 部署 LLMOne 客户端的设备(通常是您的笔记本或工作电脑)与将要部署大模型的目标硬件(如大模型一体机、专用服务器等)必须连接到同一个局域网子网内,以便两者可以相互发现和通信。
-
目标硬件网络配置:
- 管理网络接口 (BMC): 大多数服务器和一体机都配备了基板管理控制器,如 iDRAC, iLO, iBMC、openUBMC 等。请确保此 BMC 接口已经连接到网络,并且已配置有效的 IP 地址。您需要准备好此管理接口的 IP 地址、登录用户名及密码。这些信息通常可以从您的硬件供应商处获取,或者在设备首次启动时通过连接显示器和键盘进入 BMC 的配置界面进行设置或查看。LLMOne 将使用这些凭据通过 BMC 对目标硬件执行底层操作,例如远程开关机、挂载安装镜像、配置启动顺序以及安装操作系统等。
- 数据网络接口: 除了 BMC 管理接口外,目标硬件还需要至少一个数据网络接口连接到您的业务网络或实验网络。此接口将用于大模型推理服务本身的网络通信、应用访问以及可能的集群节点间通信。请确保此数据接口也已正确连接并(若需要)配置了相应的 IP 地址。
- 电源要求: 请确保目标硬件已正确连接到稳定电源,并处于可启动状态。
-
下载并安装 LLMOne 客户端: 注意,Windows 平台使用的是无需安装的便携式应用,您只需解压并运行“LLMOne.exe” 即可。 macOS 平台使用的是 DMG 安装包。您需要将 LLMOne.app 拖入应用程序文件夹中。
-
获取 LLMOne 部署资源包: 我们目前提供了一个资源包示例和一个资源包构建工具,您可以从开放原子开源基金会下载 LLMOne 示例资源包 或者使用夸克网盘下载 LLMOne 示例资源包,同时 LLMOne 资源包构建工具 仍在开发中,即将上线,您可以使用该工具来创建自己的资源包。
-
启动 LLMOne 客户端,按照用户手册的提示,完成配置与安装过程。
我们不断为 LLMOne 添加新功能和能力。以下是我们接下来的工作计划:
- [ ] SSH 模式增强:即将支持使用 SSH 连接到目标硬件进行部署,简化网络配置要求,支持通过堡垒机连接服务器等复杂网络环境,以及对已安装操作系统的设备进行部署。
- [ ] Apple Silicon 平台深度支持:持续优化对 Mac Studio / Mac Mini 等 Apple Silicon 设备的支持,打造高效能桌面级 LLM 解决方案。
- [ ] NVIDIA DGX 及服务器平台支持:增强对 NVIDIA DGX 系统及其他服务器硬件的适配与性能优化。
- [ ] "Deploy on LLMOne" 应用生态:开发标准化应用模板与接口 (SDK/API),方便开发者和厂商(如 OpenWebUI, Dify, RAGFlow 等,也包括方案商提供的其他大语言模型应用)快速将自己的应用集成到 LLMOne 中,实现一键部署。
- [ ] 更多主流大模型集成:持续跟进并集成更多优秀的开源大语言模型。
- [ ] 增强模型管理功能:支持模型版本控制、多模型服务优化和更精细化的资源调度。
- [ ] 数据与向量存储集成:支持 openGauss、Milvus 等开源数据库与向量数据库的便捷集成,完善 RAG 等应用场景的本地化支持。
本仓库代码均遵循 木兰宽松许可证第2版,同时兼容 Apache License 2.0 许可证。
我们欢迎各种技能水平的开发者贡献!无论是修复错误、添加功能还是改进文档,您的贡献都很有价值。
请查看 CONTRIBUTING.md 了解如何开始。
Contributors
有关更详细的信息,请访问我们的官方文档。
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for LLMOne
Similar Open Source Tools

LLMOne
LLMOne is an open-source, lightweight enterprise-level platform for deploying and serving large language models. It aims to address pain points in traditional large model private deployment such as long cycles, complex configurations, performance challenges, and high operational costs. LLMOne simplifies the deployment process with highly automated workflows and optimized runtime environments, ensuring enterprise-level performance and stability. It caters to developers, manufacturers, and users of large language models, providing features like rapid deployment, professional inference performance, broad compatibility with AI hardware, flexible model and application management, visual operational monitoring, and an open application ecosystem.

GoMaxAI-ChatGPT-Midjourney-Pro
GoMaxAI Pro is an AI-powered application for personal, team, and enterprise private operations. It supports various models like ChatGPT, Claude, Gemini, Kimi, Wenxin Yiyuan, Xunfei Xinghuo, Tsinghua Zhipu, Suno-v3.5, and Luma-video. The Pro version offers a new UI interface, member points system, management backend, homepage features, support for various content formats, AI video capabilities, SAAS multi-opening function, bug fixes, and more. It is built using web frontend with Vue3, mobile frontend with Uniapp, management frontend with Vue3, backend with Nodejs, and uses MySQL5.7(+) + Redis for data support. It can be deployed on Linux, Windows, or MacOS, with data storage options including local storage, Aliyun OSS, Tencent Cloud COS, and Chevereto image bed.

agenta
Agenta is an open-source LLM developer platform for prompt engineering, evaluation, human feedback, and deployment of complex LLM applications. It provides tools for prompt engineering and management, evaluation, human annotation, and deployment, all without imposing any restrictions on your choice of framework, library, or model. Agenta allows developers and product teams to collaborate in building production-grade LLM-powered applications in less time.

omnia
Omnia is a deployment tool designed to turn servers with RPM-based Linux images into functioning Slurm/Kubernetes clusters. It provides an Ansible playbook-based deployment for Slurm and Kubernetes on servers running an RPM-based Linux OS. The tool simplifies the process of setting up and managing clusters, making it easier for users to deploy and maintain their infrastructure.

sora-prompt-zh
Sora-prompt-zh is a repository providing guidance on using Sora in various scenarios, learning how to make it understand your commands, and exploring Sora's multiple applications. It offers AI models that can create realistic and imaginative scenes from OpenAI's text instructions. The repository includes prompts for generating videos, animations, video editing, image generation, and more. Users can find examples and generated videos based on different video styles and modify them as needed. Although Sora is not officially released yet, the repository aims to collect prompts to help users quickly start using Sora to generate desired videos.

Operit
Operit AI is a fully functional AI assistant application for mobile devices, running independently on Android devices with powerful tool invocation capabilities. It offers over 40 built-in tools for file system operations, HTTP requests, system operations, UI automation, and media processing. The app combines these tools with rich plugins to enable a wide range of tasks, from simple to complex, providing a comprehensive experience of a smartphone AI assistant.

generative-ai-use-cases-jp
Generative AI (生成 AI) brings revolutionary potential to transform businesses. This repository demonstrates business use cases leveraging Generative AI.

NeuroAI_Course
Neuromatch Academy NeuroAI Course Syllabus is a repository that contains the schedule and licensing information for the NeuroAI course. The course is designed to provide participants with a comprehensive understanding of artificial intelligence in neuroscience. It covers various topics related to AI applications in neuroscience, including machine learning, data analysis, and computational modeling. The content is primarily accessed from the ebook provided in the repository, and the course is scheduled for July 15-26, 2024. The repository is shared under a Creative Commons Attribution 4.0 International License and software elements are additionally licensed under the BSD (3-Clause) License. Contributors to the project are acknowledged and welcomed to contribute further.

easyaiot
EasyAIoT is an AI cloud platform designed to support camera integration, annotation, training, inference, data collection, analysis, alerts, recording, storage, and deployment. It aims to provide a zero-threshold AI experience for everyone, with a focus on cameras below a hundred levels. The platform consists of five core projects: WEB module for frontend management, DEVICE module for device management, VIDEO module for video processing, AI module for AI analysis, and TASK module for high-performance task execution. EasyAIoT combines Java, Python, and C++ to create a versatile and user-friendly AIoT platform.

KubeDoor
KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.

LxgwZhenKai
LxgwZhenKai is a Chinese font derived from LXGW WenKai, manually adjusted for boldness and supplemented with AI assistance for character additions. The font aims to provide a comfortable reading experience on screens while also serving as a bold version of LXGW WenKai for temporary use. It contains over 13,000 characters, including common simplified and traditional Chinese characters, and is licensed under SIL Open Font License 1.1. Users are allowed to freely use, distribute, modify, and create derivative fonts based on LxgwZhenKai.

md_design
Nakidka is a tool for 1C:Enterprise 8 that allows for quick creation of forms based on text descriptions. It uses a simple and understandable syntax similar to Markdown, and also supports visual design of interface elements.

aituber-kit
AITuber-Kit is a tool that enables users to interact with AI characters, conduct AITuber live streams, and engage in external integration modes. Users can easily converse with AI characters using various LLM APIs, stream on YouTube with AI character reactions, and send messages to server apps via WebSocket. The tool provides settings for API keys, character configurations, voice synthesis engines, and more. It supports multiple languages and allows customization of VRM models and background images. AITuber-Kit follows the MIT license and offers guidelines for adding new languages to the project.

activepieces
Activepieces is an open source replacement for Zapier, designed to be extensible through a type-safe pieces framework written in Typescript. It features a user-friendly Workflow Builder with support for Branches, Loops, and Drag and Drop. Activepieces integrates with Google Sheets, OpenAI, Discord, and RSS, along with 80+ other integrations. The list of supported integrations continues to grow rapidly, thanks to valuable contributions from the community. Activepieces is an open ecosystem; all piece source code is available in the repository, and they are versioned and published directly to npmjs.com upon contributions. If you cannot find a specific piece on the pieces roadmap, please submit a request by visiting the following link: Request Piece Alternatively, if you are a developer, you can quickly build your own piece using our TypeScript framework. For guidance, please refer to the following guide: Contributor's Guide

codemod
Codemod platform is a tool that helps developers create, distribute, and run codemods in codebases of any size. The AI-powered, community-led codemods enable automation of framework upgrades, large refactoring, and boilerplate programming with speed and developer experience. It aims to make dream migrations a reality for developers by providing a platform for seamless codemod operations.

HaE
HaE is a framework project in the field of network security (data security) that combines artificial intelligence (AI) large models to achieve highlighting and information extraction of HTTP messages (including WebSocket). It aims to reduce testing time, focus on valuable and meaningful messages, and improve vulnerability discovery efficiency. The project provides a clear and visual interface design, simple interface interaction, and centralized data panel for querying and extracting information. It also features built-in color upgrade algorithm, one-click export/import of data, and integration of AI large models API for optimized data processing.
For similar tasks

ai-on-gke
This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources

ray
Ray is a unified framework for scaling AI and Python applications. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including Data, Train, Tune, RLlib, and Serve. Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations. With Ray, you can seamlessly scale the same code from a laptop to a cluster, making it easy to meet the compute-intensive demands of modern ML workloads.

labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.

djl
Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. It is designed to be easy to get started with and simple to use for Java developers. DJL provides a native Java development experience and allows users to integrate machine learning and deep learning models with their Java applications. The framework is deep learning engine agnostic, enabling users to switch engines at any point for optimal performance. DJL's ergonomic API interface guides users with best practices to accomplish deep learning tasks, such as running inference and training neural networks.

mlflow
MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). MLflow's current components are:
* `MLflow Tracking

tt-metal
TT-NN is a python & C++ Neural Network OP library. It provides a low-level programming model, TT-Metalium, enabling kernel development for Tenstorrent hardware.

burn
Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.

awsome-distributed-training
This repository contains reference architectures and test cases for distributed model training with Amazon SageMaker Hyperpod, AWS ParallelCluster, AWS Batch, and Amazon EKS. The test cases cover different types and sizes of models as well as different frameworks and parallel optimizations (Pytorch DDP/FSDP, MegatronLM, NemoMegatron...).
For similar jobs

sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.