
KubeDoor
基于AI推荐+专家经验的K8S负载感知调度与容量管控系统
Stars: 272

KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.
README:
国内用户如果访问异常,可以访问Gitee同步站:https://gitee.com/starsl/KubeDoor
🌼花折 - KubeDoor 是一个使用Python + Vue开发,基于K8S准入控制机制的微服务资源管控平台,以及支持多K8S集群统一远程存储、监控、告警、通知、展示的一站式K8S监控平台,并且专注微服务每日高峰时段的资源视角,实现了微服务的资源分析统计与强管控,确保微服务资源的资源申请率和真实使用率一致。
- 🌊支持多K8S集群统一远程存储、监控、告警、通知、展示的一站式K8S监控方案。
- 📀Helm一键部署完成监控、采集、展示、告警、通知(多K8S集群监控从未如此简单✨)。
- 🚀基于VictoriaMetrics全套方案实现多K8S统一监控,统一告警规则管理,实现免配置完整指标采集。
- 🎨WEBUI集成了K8S节点监控看板与K8S资源监控看板,均支持在单一看板中查看各个K8S集群的资源情况。
- 📐集成了大量K8S资源,JVM资源与K8S节点的告警规则,并支持统一维护管理,支持对接企微,钉钉,飞书告警通知及灵活的@机制。
- 🎭实时监控管理页面,对K8S资源,节点资源统一监控展示的Grafana看板。
- ⏱️支持即时、定时、周期性任务执行微服务的扩缩容和重启操作。
- 🦄K8S微服务统一告警分析与处理页面,告警按天智能聚合,相同告警按日累计计数,每日告警清晰明了。
- 🕹️支持对POD进行隔离,删除,Java dump,jstack,jfr,JVM数据采集分析等操作,并通知到群。
- ⚙️对微服务每日高峰期的P95资源展示,以及对Pod数、资源限制值的维护管理。
- 🎨基于日维度采集每日高峰时段P95的资源数据,可以很好的观察各微服务长期的资源变化情况,即使查看1年的数据也很流畅。
- 🏅高峰时段全局资源统计与各资源TOP10
- 🔎命名空间级别高峰时段P95资源使用量与资源消耗占整体资源的比例
- 🧿微服务级别高峰期整体资源与使用率分析
- 📈微服务与Pod级别的资源曲线图(需求值,限制值,使用值)
- ✨基于准入控制机制实现K8S微服务资源的真实使用率和资源申请需求值保持一致,具有非常重要的意义。
- 🌊K8S调度器通过真实的资源需求值就能够更精确地将Pod调度到合适的节点上,避免资源碎片,实现节点的资源均衡。
- ♻K8S自动扩缩容也依赖资源需求值来判断,真实的需求值可以更精准的触发扩缩容操作。
- 🛡K8S的保障服务质量(QoS机制)与需求值结合,真实需求值的Pod会被优先保留,保证关键服务的正常运行。
- ❤️Agent管理页面:更新,维护Agent状态,配置采集与管控。
- 🔒基于NGINX basic认证,支持LDAP,支持所有操作审计日志与通知。
- 📊所有看板基于Grafana创建,并整合到前端UI内,使得数据分析可以快速实现更优雅的展示。
### 【下载helm包】
wget https://StarsL.cn/kubedoor/kubedoor-1.0.0.tgz
tar -zxvf kubedoor-1.0.0.tgz
cd kubedoor
### 【master端安装】
# 编辑values-master.yaml文件,请仔细阅读注释,根据描述修改配置内容。
# try
helm install kubedoor . --namespace kubedoor --create-namespace --values values-master.yaml --dry-run --debug
# install
helm install kubedoor . --namespace kubedoor --create-namespace --values values-master.yaml
### 【agent端安装】
# 编辑values-agent.yaml文件,请仔细阅读注释,根据描述修改配置内容。
helm install kubedoor-agent . --namespace kubedoor --create-namespace --values values-agent.yaml --set tsdb.external_labels_value=kmw-prod-kunlun
-
使用K8S节点IP + kubedoor-web的NodePort访问,默认账号密码都是
kubedoor
-
点击
agent管理
,先开启自动采集,设置好高峰期时段,再执行采集,输入需要采集的历史数据时长,点击采集并更新
,即可采集历史数据并更新高峰时段数据到管控表。默认会从Prometheus采集10天数据(建议采集1个月),并将10天内最大资源消耗日的数据写入到管控表,如果耗时较长,请等待采集完成或缩短采集时长。重复执行
采集并更新
不会导致重复写入数据,请放心使用,每次采集后都会自动将10天内最大资源消耗日的数据写入到管控表。
- 📅KubeDoor 项目进度
- 🥈英文版发布
- 🏅微服务AI评分:根据资源使用情况,发现资源浪费的问题,结合AI缩容,降本增效,做AI综合评分,接入K8S异常AI分析能力。
- 🏅微服务AI缩容:基于微服务高峰期的资源信息,对接AI分析与专家经验,计算微服务Pod数是否合理,生成缩容指令与统计。
- 🏅根据K8S节点资源使用率做节点管控与调度分析
- ✅采集更多的微服务资源信息: QPS/JVM/GC
- ✅针对微服务Pod做精细化操作:隔离、删除、dump、jstack、jfr、jvm
- ✅K8S资源告警管理,按日智能聚合。
- ✅多K8S支持:在统一的WebUI对多K8S做管控和资源分析展示。
- ✅集成K8S实时监控能力,实现一键部署,整合K8S实时资源看板。
感谢如下优秀的项目,没有这些项目,不可能会有KubeDoor:
-
前后端技术栈
-
基础服务
-
特别鸣谢
- CassTime:KubeDoor的诞生离不开🦄开思的支持。
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for KubeDoor
Similar Open Source Tools

KubeDoor
KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.

omnia
Omnia is a deployment tool designed to turn servers with RPM-based Linux images into functioning Slurm/Kubernetes clusters. It provides an Ansible playbook-based deployment for Slurm and Kubernetes on servers running an RPM-based Linux OS. The tool simplifies the process of setting up and managing clusters, making it easier for users to deploy and maintain their infrastructure.

godoos
GodoOS is an efficient intranet office operating system that includes various office tools such as word/excel/ppt/pdf/internal chat/whiteboard/mind map, with native file storage support. The platform interface mimics the Windows style, making it easy to operate while maintaining low resource consumption and high performance. It automatically connects to intranet users without registration, enabling instant communication and file sharing. The flexible and highly configurable app store allows for unlimited expansion.

douyin-chatgpt-bot
Douyin ChatGPT Bot is an AI-driven system for automatic replies on Douyin, including comment and private message replies. It offers features such as comment filtering, customizable robot responses, and automated account management. The system aims to enhance user engagement and brand image on the Douyin platform, providing a seamless experience for managing interactions with followers and potential customers.

uDesktopMascot
uDesktopMascot is an open-source project for a desktop mascot application with a theme of 'freedom of creation'. It allows users to load and display VRM or GLB/FBX model files on the desktop, customize GUI colors and background images, and access various features through a menu screen. The application supports Windows 10/11 and macOS platforms.

higress
Higress is an open-source cloud-native API gateway built on the core of Istio and Envoy, based on Alibaba's internal practice of Envoy Gateway. It is designed for AI-native API gateway, serving AI businesses such as Tongyi Qianwen APP, Bailian Big Model API, and Machine Learning PAI platform. Higress provides capabilities to interface with LLM model vendors, AI observability, multi-model load balancing/fallback, AI token flow control, and AI caching. It offers features for AI gateway, Kubernetes Ingress gateway, microservices gateway, and security protection gateway, with advantages in production-level scalability, stream processing, extensibility, and ease of use.

activepieces
Activepieces is an open source replacement for Zapier, designed to be extensible through a type-safe pieces framework written in Typescript. It features a user-friendly Workflow Builder with support for Branches, Loops, and Drag and Drop. Activepieces integrates with Google Sheets, OpenAI, Discord, and RSS, along with 80+ other integrations. The list of supported integrations continues to grow rapidly, thanks to valuable contributions from the community. Activepieces is an open ecosystem; all piece source code is available in the repository, and they are versioned and published directly to npmjs.com upon contributions. If you cannot find a specific piece on the pieces roadmap, please submit a request by visiting the following link: Request Piece Alternatively, if you are a developer, you can quickly build your own piece using our TypeScript framework. For guidance, please refer to the following guide: Contributor's Guide

KouriChat
KouriChat is a project that seamlessly integrates virtual and real interactions, providing eternal gentle bonds. It offers features like WeChat integration, immersive role-playing, intelligent conversation segmentation, emotion-based emojis, image generation, image recognition, voice messages, and more. The project is focused on technical research and learning exchanges, with a strong emphasis on ethical and legal guidelines. Users are required to take full responsibility for their actions, especially minors who should use the tool under supervision. The project architecture includes avatar configurations, data storage, handlers, AI service interfaces, a web UI, and utility libraries.

llm-action
This repository provides a comprehensive guide to large language models (LLMs), covering various aspects such as training, fine-tuning, compression, and applications. It includes detailed tutorials, code examples, and explanations of key concepts and techniques. The repository is maintained by Liguo Dong, an AI researcher and engineer with expertise in LLM research and development.

NGCBot
NGCBot is a WeChat bot based on the HOOK mechanism, supporting scheduled push of security news from FreeBuf, Xianzhi, Anquanke, and Qianxin Attack and Defense Community, KFC copywriting, filing query, phone number attribution query, WHOIS information query, constellation query, weather query, fishing calendar, Weibei threat intelligence query, beautiful videos, beautiful pictures, and help menu. It supports point functions, automatic pulling of people, ad detection, automatic mass sending, Ai replies, rich customization, and easy for beginners to use. The project is open-source and periodically maintained, with additional features such as Ai (Gpt, Xinghuo, Qianfan), keyword invitation to groups, automatic mass sending, and group welcome messages.

NeuroAI_Course
Neuromatch Academy NeuroAI Course Syllabus is a repository that contains the schedule and licensing information for the NeuroAI course. The course is designed to provide participants with a comprehensive understanding of artificial intelligence in neuroscience. It covers various topics related to AI applications in neuroscience, including machine learning, data analysis, and computational modeling. The content is primarily accessed from the ebook provided in the repository, and the course is scheduled for July 15-26, 2024. The repository is shared under a Creative Commons Attribution 4.0 International License and software elements are additionally licensed under the BSD (3-Clause) License. Contributors to the project are acknowledged and welcomed to contribute further.

Chenyme-AAVT
Chenyme-AAVT is a user-friendly tool that provides automatic video and audio recognition and translation. It leverages the capabilities of Whisper, a powerful speech recognition model, to accurately identify speech in videos and audios. The recognized speech is then translated using ChatGPT or KIMI, ensuring high-quality translations. With Chenyme-AAVT, you can quickly generate字幕 files and merge them with the original video, making video translation a breeze. The tool supports various languages, allowing you to translate videos and audios into your desired language. Additionally, Chenyme-AAVT offers features such as VAD (Voice Activity Detection) to enhance recognition accuracy, GPU acceleration for faster processing, and support for multiple字幕 formats. Whether you're a content creator, translator, or anyone looking to make video translation more efficient, Chenyme-AAVT is an invaluable tool.

SwanLab
SwanLab is an open-source, lightweight AI experiment tracking tool that provides a platform for tracking, comparing, and collaborating on experiments, aiming to accelerate the research and development efficiency of AI teams by 100 times. It offers a friendly API and a beautiful interface, combining hyperparameter tracking, metric recording, online collaboration, experiment link sharing, real-time message notifications, and more. With SwanLab, researchers can document their training experiences, seamlessly communicate and collaborate with collaborators, and machine learning engineers can develop models for production faster.

99AI
99AI is a commercializable AI web application based on NineAI 2.4.2 (no authorization, no backdoors, no piracy, integrated front-end and back-end integration packages, supports Docker rapid deployment). The uncompiled source code is temporarily closed. Compared with the stable version, the development version is faster.

ChatGPT-Next-Web-Pro
ChatGPT-Next-Web-Pro is a tool that provides an enhanced version of ChatGPT-Next-Web with additional features and functionalities. It offers complete ChatGPT-Next-Web functionality, file uploading and storage capabilities, drawing and video support, multi-modal support, reverse model support, knowledge base integration, translation, customizations, and more. The tool can be deployed with or without a backend, allowing users to interact with AI models, manage accounts, create models, manage API keys, handle orders, manage memberships, and more. It supports various cloud services like Aliyun OSS, Tencent COS, and Minio for file storage, and integrates with external APIs like Azure, Google Gemini Pro, and Luma. The tool also provides options for customizing website titles, subtitles, icons, and plugin buttons, and offers features like voice input, file uploading, real-time token count display, and more.

LxgwZhenKai
LxgwZhenKai is a Chinese font derived from LXGW WenKai, manually adjusted for boldness and supplemented with AI assistance for character additions. The font aims to provide a comfortable reading experience on screens while also serving as a bold version of LXGW WenKai for temporary use. It contains over 13,000 characters, including common simplified and traditional Chinese characters, and is licensed under SIL Open Font License 1.1. Users are allowed to freely use, distribute, modify, and create derivative fonts based on LxgwZhenKai.
For similar tasks

KubeDoor
KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.

robusta
Robusta is a tool designed to enhance Prometheus notifications for Kubernetes environments. It offers features such as smart grouping to reduce notification spam, AI investigation for alert analysis, alert enrichment with additional data like pod logs, self-healing capabilities for defining auto-remediation rules, advanced routing options, problem detection without PromQL, change-tracking for Kubernetes resources, auto-resolve functionality, and integration with various external systems like Slack, Teams, and Jira. Users can utilize Robusta with or without Prometheus, and it can be installed alongside existing Prometheus setups or as part of an all-in-one Kubernetes observability stack.
For similar jobs

AirGo
AirGo is a front and rear end separation, multi user, multi protocol proxy service management system, simple and easy to use. It supports vless, vmess, shadowsocks, and hysteria2.

mosec
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API. * **Highly performant** : web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O * **Ease of use** : user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing * **Dynamic batching** : aggregate requests from different users for batched inference and distribute results back * **Pipelined stages** : spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads * **Cloud friendly** : designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems * **Do one thing well** : focus on the online serving part, users can pay attention to the model optimization and business logic

llm-code-interpreter
The 'llm-code-interpreter' repository is a deprecated plugin that provides a code interpreter on steroids for ChatGPT by E2B. It gives ChatGPT access to a sandboxed cloud environment with capabilities like running any code, accessing Linux OS, installing programs, using filesystem, running processes, and accessing the internet. The plugin exposes commands to run shell commands, read files, and write files, enabling various possibilities such as running different languages, installing programs, starting servers, deploying websites, and more. It is powered by the E2B API and is designed for agents to freely experiment within a sandboxed environment.

pezzo
Pezzo is a fully cloud-native and open-source LLMOps platform that allows users to observe and monitor AI operations, troubleshoot issues, save costs and latency, collaborate, manage prompts, and deliver AI changes instantly. It supports various clients for prompt management, observability, and caching. Users can run the full Pezzo stack locally using Docker Compose, with prerequisites including Node.js 18+, Docker, and a GraphQL Language Feature Support VSCode Extension. Contributions are welcome, and the source code is available under the Apache 2.0 License.

learn-generative-ai
Learn Cloud Applied Generative AI Engineering (GenEng) is a course focusing on the application of generative AI technologies in various industries. The course covers topics such as the economic impact of generative AI, the role of developers in adopting and integrating generative AI technologies, and the future trends in generative AI. Students will learn about tools like OpenAI API, LangChain, and Pinecone, and how to build and deploy Large Language Models (LLMs) for different applications. The course also explores the convergence of generative AI with Web 3.0 and its potential implications for decentralized intelligence.

gcloud-aio
This repository contains shared codebase for two projects: gcloud-aio and gcloud-rest. gcloud-aio is built for Python 3's asyncio, while gcloud-rest is a threadsafe requests-based implementation. It provides clients for Google Cloud services like Auth, BigQuery, Datastore, KMS, PubSub, Storage, and Task Queue. Users can install the library using pip and refer to the documentation for usage details. Developers can contribute to the project by following the contribution guide.

fluid
Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intensive applications, such as big data and AI applications. It implements dataset abstraction, scalable cache runtime, automated data operations, elasticity and scheduling, and is runtime platform agnostic. Key concepts include Dataset and Runtime. Prerequisites include Kubernetes version > 1.16, Golang 1.18+, and Helm 3. The tool offers features like accelerating remote file accessing, machine learning, accelerating PVC, preloading dataset, and on-the-fly dataset cache scaling. Contributions are welcomed, and the project is under the Apache 2.0 license with a vendor-neutral approach.

aiges
AIGES is a core component of the Athena Serving Framework, designed as a universal encapsulation tool for AI developers to deploy AI algorithm models and engines quickly. By integrating AIGES, you can deploy AI algorithm models and engines rapidly and host them on the Athena Serving Framework, utilizing supporting auxiliary systems for networking, distribution strategies, data processing, etc. The Athena Serving Framework aims to accelerate the cloud service of AI algorithm models and engines, providing multiple guarantees for cloud service stability through cloud-native architecture. You can efficiently and securely deploy, upgrade, scale, operate, and monitor models and engines without focusing on underlying infrastructure and service-related development, governance, and operations.