
AI-CloudOps
AI+CloudOps智能化运维平台
Stars: 129

AI+CloudOps is a cloud-native operations management platform designed for enterprises. It aims to integrate artificial intelligence technology with cloud-native practices to significantly improve the efficiency and level of operations work. The platform offers features such as AIOps for monitoring data analysis and alerts, multi-dimensional permission management, visual CMDB for resource management, efficient ticketing system, deep integration with Prometheus for real-time monitoring, and unified Kubernetes management for cluster optimization.
README:
|
|
仓库 | 描述 | 技术栈 | 状态 |
---|---|---|---|
🔧 AI-CloudOps | 核心后端服务 | Go + Gin + GORM | |
🎨 AI-CloudOps-web | 前端用户界面 | Vue + TypeScript + Ant Design vue | |
🧠 AI-CloudOps-aiops | AI 智能运维模块 | Python + FastAPI + scikit-learn |
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ 🎯 前端界面 │ ──▶ │ 🔧 核心后端 │ ──▶ │ 🧠 AI 分析 │
│ React + TS │ │ Go + Gin │ │ Python + ML │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
▼
┌─────────────────────────┐
│ ☁️ 云原生基础设施 │
│ K8s + Prometheus + 🗄️ │
└─────────────────────────┘
|
|
|
|
|
|
演示地址: http://68.64.177.180 登录凭据:
👤 用户名: |
🎨 仪表板: 查看系统整体运行状态 ☸️ K8s 管理: 体验 Kubernetes 集群管理 📊 监控面板: 感受实时监控能力 🤖 AI 分析: 尝试智能故障分析 📋 工单系统: 体验完整工单流程
🔧 GoSimplicity 项目发起人 & 核心贡献者 |
⚡ Penge666 资深开发者 |
🌟 shixiaocaia 资深贡献者 |
💼 daihao4371 功能开发者 |
🚴 骑自行车追狗 活跃贡献者 |
|
|
# 检查所有依赖
go version # >= 1.23
node --version # >= 21.0
pnpm --version # latest
docker --version # latest
python3 --version # >= 3.11 |
📋 批量克隆脚本
#!/bin/bash
# 一键克隆所有项目
echo "🚀 开始克隆 AI-CloudOps 项目组..."
repositories=(
"https://github.com/GoSimplicity/AI-CloudOps.git"
"https://github.com/GoSimplicity/AI-CloudOps-web.git"
"https://github.com/GoSimplicity/AI-CloudOps-aiops.git"
)
for repo in "${repositories[@]}"; do
echo "📦 克隆 $repo"
git clone "$repo"
done
echo "✅ 所有项目克隆完成!"
|
|
|
|
# 进入后端项目目录
cd AI-CloudOps
# 🐳 使用 Docker Compose 启动中间件
docker-compose -f docker-compose-env.yaml up -d
# ⚙️ 配置环境变量
cp env.example .env
# 🔍 检查服务状态
docker-compose -f docker-compose-env.yaml ps |
🔧 启动的服务: |
# 进入前端项目目录
cd ../AI-CloudOps-web
# 📦 安装依赖
pnpm install
# 🚀 启动开发服务器
pnpm run dev |
🌐 访问地址:
|
# 回到后端项目目录
cd ../AI-CloudOps
# 📥 安装 Go 依赖
go mod tidy
# 🚀 启动后端主服务
go run main.go |
🔗 服务地址:
|
# 进入 AIOps 项目目录
cd ../AI-CloudOps-aiops
# ⚙️ 配置环境变量
cp env.example .env
# 📦 安装 Python 依赖
pip install -r requirements.txt
# 🤖 训练机器学习模型 (首次运行)
cd data/ && python machine-learning.py && cd ..
# 🚀 启动 AI 服务
python app/main.py |
🤖 AI 能力:
|
手动构建和部署 |
推荐方案 |
云原生生产级 |
📦 前端构建
# 进入前端项目目录
cd AI-CloudOps-web
# 📦 安装依赖
pnpm install
# 🔨 构建生产版本
pnpm run build
# 📊 查看构建结果
ls -la dist/
🌐 Nginx 配置示例:
server {
listen 80;
server_name your-domain.com;
location / {
root /var/www/ai-cloudops/dist;
try_files $uri $uri/ /index.html;
}
location /api/ {
proxy_pass http://localhost:8000/api/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
🔧 后端构建
# 回到后端项目目录
cd AI-CloudOps
# 🔨 构建生产二进制
go build -o bin/ai-cloudops main.go
# ⚙️ 配置生产环境变量
cp config/config.production.yaml config.yaml
# 🚀 启动生产服务
./bin/ai-cloudops
🔧 Systemd 服务配置:
[Unit]
Description=AI-CloudOps Backend Service
After=network.target
[Service]
Type=simple
User=cloudops
WorkingDirectory=/opt/ai-cloudops
ExecStart=/opt/ai-cloudops/bin/ai-cloudops
Restart=always
[Install]
WantedBy=multi-user.target
# 📁 在 AI-CloudOps 项目根目录
cd AI-CloudOps
# 🚀 一键启动所有服务
docker-compose up -d
# 🔍 查看服务状态
docker-compose ps
# 📊 查看服务日志
docker-compose logs -f
🔧 高级配置选项:
# 🔄 仅更新某个服务
docker-compose up -d backend
# 📊 监控资源使用
docker-compose top
# 🧹 清理和重启
docker-compose down && docker-compose up -d
🚀 云原生生产级部署
# 📁 进入 Kubernetes 部署目录
cd deploy/kubernetes/
# ⚙️ 配置环境变量
cp config.example config
# 🚀 部署到 Kubernetes
kubectl apply -f .
# 🔍 查看部署状态
kubectl get pods,svc,ingress -l app=ai-cloudops
📊 部署架构:
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ 📱 前端 Pod │───▶│ 🔧 后端 Pod │───▶│ 🧠 AI Pod │
│ React App │ │ Go Service │ │ Python ML │
│ (多副本) │ │ (多副本) │ │ (多副本) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
▼
┌─────────────────────────┐
│ 🗄️ 数据层 Pod │
│ MySQL + Redis + 监控 │
└─────────────────────────┘
🔧 AI-CloudOps 后端架构
AI-CloudOps/
├── 📁 cmd/ # 命令行入口
│ └── webhook/ # Webhook 服务
├── 📁 config/ # 配置文件
│ ├── config.development.yaml
│ ├── config.production.yaml
│ └── webhook.yaml
├── 📁 deploy/ # 部署配置
│ ├── kubernetes/ # K8s 部署文件
│ ├── mysql/ # 数据库初始化
│ ├── nginx/ # 反向代理配置
│ └── prometheus/ # 监控配置
├── 📁 internal/ # 核心业务逻辑
│ ├── 🔐 middleware/ # 中间件 (认证、日志等)
│ ├── 📊 model/ # 数据模型
│ ├── 🏭 k8s/ # K8s 管理模块
│ ├── 👥 user/ # 用户管理
│ ├── 📈 prometheus/ # 监控指标
│ ├── 📋 workorder/ # 工单系统
│ ├── 🌲 tree/ # 服务树 CMDB
│ └── 🔧 system/ # 系统管理
├── 📁 pkg/ # 公共工具包
│ ├── di/ # 依赖注入
│ ├── utils/ # 工具函数
│ ├── ssh/ # SSH 连接
│ └── websocket/ # WebSocket 支持
├── 📁 docs/ # API 文档
│ ├── swagger.json
│ └── swagger.yaml
├── main.go # 程序入口
├── Dockerfile # 容器构建
├── docker-compose.yaml # 本地开发环境
└── go.mod # Go 模块管理
🏛️ 架构层级:
- 🌐 API Layer: RESTful API 接口层
- 🔄 Service Layer: 业务逻辑服务层
- 💾 Repository Layer: 数据访问层
- 🗄️ Infrastructure Layer: 基础设施层
⚡ AI-CloudOps-web 前端架构
AI-CloudOps-web/
├── 📁 apps/ # 应用程序目录
│ └── web-antd/ # 主应用 (Ant Design + Vue)
│ ├── src/
│ │ ├── 📁 api/ # API 接口层
│ │ ├── app.vue # 根组件
│ │ ├── bootstrap.ts # 应用启动配置
│ │ ├── 📁 composables/ # Vue Composables
│ │ ├── 📁 layouts/ # 布局组件
│ │ ├── 📁 locales/ # 国际化文件
│ │ ├── main.ts # 应用入口
│ │ ├── preferences.ts # 用户偏好设置
│ │ ├── 📁 router/ # 路由配置
│ │ ├── 📁 store/ # 状态管理
│ │ ├── 📁 types/ # TypeScript 类型
│ │ └── 📁 views/ # 页面视图组件
│ ├── dist/ # 构建输出
│ ├── public/ # 静态资源
│ ├── index.html # 入口 HTML
│ ├── package.json # 项目依赖
│ ├── tailwind.config.mjs # Tailwind 配置
│ ├── tsconfig.json # TypeScript 配置
│ └── vite.config.mts # Vite 构建配置
├── 📁 packages/ # 共享包 (Monorepo)
├── 📁 internal/ # 内部工具包
├── 📁 docs/ # 文档目录
├── 📁 scripts/ # 构建脚本
├── cspell.json # 拼写检查配置
├── eslint.config.mjs # ESLint 配置
├── stylelint.config.mjs # 样式检查配置
├── vitest.config.ts # 单元测试配置
├── vitest.workspace.ts # 测试工作空间
├── turbo.json # Turborepo 配置
├── pnpm-workspace.yaml # PNPM 工作空间
├── pnpm-lock.yaml # 锁定文件
├── package.json # 根项目依赖
└── README.md # 项目说明
🎯 技术特性:
- ⚡ Vite: 极速构建工具
- 🎨 Ant Design: 企业级 UI 组件
- 📱 响应式: 适配多端设备
- 🔄 状态管理: Zustand/Redux
- 📊 可视化: ECharts/D3.js
- 🧪 测试: Vitest + Testing Library
🤖 AI-CloudOps-aiops 智能模块
AI-CloudOps-aiops/
├── 📁 app/ # 应用主代码
│ ├── __init__.py # 包初始化文件
│ ├── main.py # FastAPI 入口
│ ├── 📁 api/ # API 路由层
│ ├── 📁 common/ # 通用模块
│ ├── 📁 config/ # 应用配置
│ ├── 📁 core/ # 核心业务逻辑
│ ├── 📁 mcp/ # MCP 协议实现
│ ├── 📁 models/ # 数据模型定义
│ ├── 📁 services/ # 业务服务层
│ └── 📁 utils/ # 工具函数
├── 📁 config/ # 全局配置文件
├── 📁 data/ # 数据存储
├── 📁 deploy/ # 部署配置
├── 📁 docs/ # 文档目录
├── 📁 logs/ # 日志文件
├── 📁 scripts/ # 脚本工具
├── 📁 tests/ # 测试用例
├── 📁 tools/ # 开发工具
├── Dockerfile # 容器构建文件
├── Dockerfile.mcp # MCP 服务容器
├── docker-compose.yml # 本地开发环境
├── env.example # 环境变量模板
├── LICENSE # 开源许可证
├── pyproject.toml # Python 项目配置
├── pytest.ini # 测试配置
├── README.md # 项目说明
└── requirements.txt # Python 依赖
🧠 AI 能力:
- 📊 时间序列分析: LSTM, ARIMA, Prophet
- 🔍 异常检测: Isolation Forest, LOF
- 🎯 分类预测: Random Forest, XGBoost
- 📈 回归分析: Linear, Polynomial, SVR
- 🕸️ 知识图谱: Neo4j, NetworkX
- 📝 NLP 处理: BERT, Word2Vec
# 1. Fork 仓库
gh repo fork GoSimplicity/AI-CloudOps
# 2. 克隆到本地
git clone https://github.com/YOUR_USERNAME/AI-CloudOps.git
# 3. 创建特性分支
git checkout -b feature/amazing-feature
# 4. 提交更改
git commit -m "✨ Add amazing feature"
# 5. 推送分支
git push origin feature/amazing-feature
# 6. 创建 Pull Request
gh pr create --title "Add amazing feature" --body "Description of the feature"
- 🐛 Bug 修复: 发现并修复代码问题
- ✨ 新功能: 添加令人兴奋的新特性
- 📚 文档: 改善项目文档和说明
- 🎨 UI/UX: 提升用户界面和体验
- ⚡ 性能: 优化系统性能和效率
- 🧪 测试: 增加测试覆盖率
- 🔧 工具: 改进开发工具和流程
- 🌐 国际化: 支持多语言和本地化
💻 代码规范
// ✅ 推荐的代码风格
func (s *UserService) CreateUser(ctx context.Context, req *CreateUserRequest) (*User, error) {
// 参数验证
if err := req.Validate(); err != nil {
return nil, fmt.Errorf("invalid request: %w", err)
}
// 业务逻辑
user := &User{
Name: req.Name,
Email: req.Email,
}
return s.repo.Create(ctx, user)
}
// ✅ 推荐的代码风格
interface UserCreateRequest {
name: string;
email: string;
}
const createUser = async (request: UserCreateRequest): Promise<User> => {
const response = await api.post('/users', request);
return response.data;
};
📝 提交信息规范
使用 Conventional Commits 规范:
<type>(<scope>): <description>
[optional body]
[optional footer]
提交类型:
-
feat
: 新功能 -
fix
: Bug 修复 -
docs
: 文档更新 -
style
: 代码格式调整 -
refactor
: 重构代码 -
perf
: 性能优化 -
test
: 测试相关 -
ci
: CI/CD 配置
示例:
git commit -m "feat(k8s): add pod auto-scaling feature"
git commit -m "fix(auth): resolve login timeout issue"
git commit -m "docs(readme): update installation guide"
感谢所有为 AI-CloudOps 做出贡献的开发者和用户!
正是因为你们的支持和贡献,这个项目才能不断进步和完善。
Go 生态系统 Vue 生态系统 Kubernetes 社区 开源社区 |
Grafana Prometheus Istio ArgoCD |
GitHub Actions Docker Hub DigitalOcean Cloudflare |
Ant Design Vue Heroicons Unsplash Figma |
共同构建更智能的云原生运维未来!
🎉 感谢您阅读到这里!如果觉得项目有用,请给我们一个 Star ⭐
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AI-CloudOps
Similar Open Source Tools

AI-CloudOps
AI+CloudOps is a cloud-native operations management platform designed for enterprises. It aims to integrate artificial intelligence technology with cloud-native practices to significantly improve the efficiency and level of operations work. The platform offers features such as AIOps for monitoring data analysis and alerts, multi-dimensional permission management, visual CMDB for resource management, efficient ticketing system, deep integration with Prometheus for real-time monitoring, and unified Kubernetes management for cluster optimization.

prism-insight
PRISM-INSIGHT is a comprehensive stock analysis and trading simulation system based on AI agents. It automatically captures daily surging stocks via Telegram channel, generates expert-level analyst reports, and performs trading simulations. The system utilizes OpenAI GPT-4.1 for in-depth stock analysis and GPT-5 for investment strategy simulation. It also interacts with users via Anthropic Claude for Telegram conversations. The system architecture includes AI analysis agents, stock tracking, PDF conversion, and Telegram bot functionalities. Users can customize criteria for identifying surging stocks, modify AI prompts, and adjust chart styles. The project is open-source under the MIT license, and all investment decisions based on the analysis are the responsibility of the user.

prompt-optimizer
Prompt Optimizer is a powerful AI prompt optimization tool that helps you write better AI prompts, improving AI output quality. It supports both web application and Chrome extension usage. The tool features intelligent optimization for prompt words, real-time testing to compare before and after optimization, integration with multiple mainstream AI models, client-side processing for security, encrypted local storage for data privacy, responsive design for user experience, and more.

NGCBot
NGCBot is a WeChat bot based on the HOOK mechanism, supporting scheduled push of security news from FreeBuf, Xianzhi, Anquanke, and Qianxin Attack and Defense Community, KFC copywriting, filing query, phone number attribution query, WHOIS information query, constellation query, weather query, fishing calendar, Weibei threat intelligence query, beautiful videos, beautiful pictures, and help menu. It supports point functions, automatic pulling of people, ad detection, automatic mass sending, Ai replies, rich customization, and easy for beginners to use. The project is open-source and periodically maintained, with additional features such as Ai (Gpt, Xinghuo, Qianfan), keyword invitation to groups, automatic mass sending, and group welcome messages.

NovelForge
NovelForge is an AI-assisted writing tool with the potential for creating long-form content of millions of words. It offers a solution that combines world-building, structured content generation, and consistency maintenance. The tool is built around four core concepts: modular 'cards', customizable 'dynamic output models', flexible 'context injection', and consistency assurance through a 'knowledge graph'. It provides a highly structured and configurable writing environment, inspired by the Snowflake Method, allowing users to create and organize their content in a tree-like structure. NovelForge is highly customizable and extensible, allowing users to tailor their writing workflow to their specific needs.

JeecgBoot
JeecgBoot is a Java AI Low Code Platform for Enterprise web applications, based on BPM and code generator. It features a SpringBoot2.x/3.x backend, SpringCloud, Ant Design Vue3, Mybatis-plus, Shiro, JWT, supporting microservices, multi-tenancy, and AI capabilities like DeepSeek and ChatGPT. The powerful code generator allows for one-click generation of frontend and backend code without writing any code. JeecgBoot leads the way in AI low-code development mode, helping to solve 80% of repetitive work in Java projects and allowing developers to focus more on business logic.

py-xiaozhi
py-xiaozhi is a Python-based XiaoZhi voice client designed for learning code and experiencing AI XiaoZhi's voice functions without hardware conditions. It features voice interaction, graphical interface, volume control, session management, encrypted audio transmission, CLI mode, and automatic copying of verification codes and opening browsers for first-time users. The project aims to optimize and add new features to zhh827's py-xiaozhi based on the original hardware project xiaozhi-esp32 and the Python implementation py-xiaozhi.

KubeDoor
KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.

claude-init
Claude Code Chinese development suite is a localized version based on the Claude Code Development Kit, offering a seamless Chinese AI programming experience. It features complete Chinese AI commands, documentation system, error messages, and installation experience. The suite includes intelligent context management with a three-tier document structure, automatic context injection, smart document routing, and cross-session state management. It integrates development tools like Hook system, MCP server support, security scans, and notification system. Additionally, it provides a comprehensive template library with project templates, document templates, and configuration examples.

aituber-kit
AITuber-Kit is a tool that enables users to interact with AI characters, conduct AITuber live streams, and engage in external integration modes. Users can easily converse with AI characters using various LLM APIs, stream on YouTube with AI character reactions, and send messages to server apps via WebSocket. The tool provides settings for API keys, character configurations, voice synthesis engines, and more. It supports multiple languages and allows customization of VRM models and background images. AITuber-Kit follows the MIT license and offers guidelines for adding new languages to the project.

godoos
GodoOS is an efficient intranet office operating system that includes various office tools such as word/excel/ppt/pdf/internal chat/whiteboard/mind map, with native file storage support. The platform interface mimics the Windows style, making it easy to operate while maintaining low resource consumption and high performance. It automatically connects to intranet users without registration, enabling instant communication and file sharing. The flexible and highly configurable app store allows for unlimited expansion.

DocTranslator
DocTranslator is a document translation tool that supports various file formats, compatible with OpenAI format API, and offers batch operations and multi-threading support. Whether for individual users or enterprise teams, DocTranslator helps efficiently complete document translation tasks. It supports formats like txt, markdown, word, csv, excel, pdf (non-scanned), and ppt for AI translation. The tool is deployed using Docker for easy setup and usage.

Code-Review-GPT-Gitlab
A project that utilizes large models to help with Code Review on Gitlab, aimed at improving development efficiency. The project is customized for Gitlab and is developing a Multi-Agent plugin for collaborative review. It integrates various large models for code security issues and stays updated with the latest Code Review trends. The project architecture is designed to be powerful, flexible, and efficient, with easy integration of different models and high customization for developers.

chatless
Chatless is a modern AI chat desktop application built on Tauri and Next.js. It supports multiple AI providers, can connect to local Ollama models, supports document parsing and knowledge base functions. All data is stored locally to protect user privacy. The application is lightweight, simple, starts quickly, and consumes minimal resources.

AIstudioProxyAPI
AI Studio Proxy API is a Python-based proxy server that converts the Google AI Studio web interface into an OpenAI-compatible API. It provides stable API access through Camoufox (anti-fingerprint detection Firefox) and Playwright automation. The project offers an OpenAI-compatible API endpoint, a three-layer streaming response mechanism, dynamic model switching, complete parameter control, anti-fingerprint detection, script injection functionality, modern web UI, graphical interface launcher, flexible authentication system, modular architecture, unified configuration management, and modern development tools.
For similar tasks

AI-CloudOps
AI+CloudOps is a cloud-native operations management platform designed for enterprises. It aims to integrate artificial intelligence technology with cloud-native practices to significantly improve the efficiency and level of operations work. The platform offers features such as AIOps for monitoring data analysis and alerts, multi-dimensional permission management, visual CMDB for resource management, efficient ticketing system, deep integration with Prometheus for real-time monitoring, and unified Kubernetes management for cluster optimization.

askui
AskUI is a reliable, automated end-to-end automation tool that only depends on what is shown on your screen instead of the technology or platform you are running on.

knowledge
This repository serves as a personal knowledge base for the owner's reference and use. It covers a wide range of topics including cloud-native operations, Kubernetes ecosystem, networking, cloud services, telemetry, CI/CD, electronic engineering, hardware projects, operating systems, homelab setups, high-performance computing applications, openwrt router usage, programming languages, music theory, blockchain, distributed systems principles, and various other knowledge domains. The content is periodically refined and published on the owner's blog for maintenance purposes.
For similar jobs

runbooks
Runbooks is a repository that is no longer active. The project has been deprecated in favor of KubeAI, a platform designed to simplify the operationalization of AI on Kubernetes. For more information, please refer to the new repository at https://github.com/substratusai/kubeai.

aiops-modules
AIOps Modules is a collection of reusable Infrastructure as Code (IAC) modules that work with SeedFarmer CLI. The modules are decoupled and can be aggregated using GitOps principles to achieve desired use cases, removing heavy lifting for end users. They must be generic for reuse in Machine Learning and Foundation Model Operations domain, adhering to SeedFarmer Guide structure. The repository includes deployment steps, project manifests, and various modules for SageMaker, Mlflow, FMOps/LLMOps, MWAA, Step Functions, EKS, and example use cases. It also supports Industry Data Framework (IDF) and Autonomous Driving Data Framework (ADDF) Modules.

Awesome-LLMOps
Awesome-LLMOps is a curated list of the best LLMOps tools, providing a comprehensive collection of frameworks and tools for building, deploying, and managing large language models (LLMs) and AI agents. The repository includes a wide range of tools for tasks such as building multimodal AI agents, fine-tuning models, orchestrating applications, evaluating models, and serving models for inference. It covers various aspects of the machine learning operations (MLOps) lifecycle, from training to deployment and observability. The tools listed in this repository cater to the needs of developers, data scientists, and machine learning engineers working with large language models and AI applications.

skyflo
Skyflo.ai is an AI agent designed for Cloud Native operations, providing seamless infrastructure management through natural language interactions. It serves as a safety-first co-pilot with a human-in-the-loop design. The tool offers flexible deployment options for both production and local Kubernetes environments, supporting various LLM providers and self-hosted models. Users can explore the architecture of Skyflo.ai and contribute to its development following the provided guidelines and Code of Conduct. The community engagement includes Discord, Twitter, YouTube, and GitHub Discussions.

AI-CloudOps
AI+CloudOps is a cloud-native operations management platform designed for enterprises. It aims to integrate artificial intelligence technology with cloud-native practices to significantly improve the efficiency and level of operations work. The platform offers features such as AIOps for monitoring data analysis and alerts, multi-dimensional permission management, visual CMDB for resource management, efficient ticketing system, deep integration with Prometheus for real-time monitoring, and unified Kubernetes management for cluster optimization.

cb-tumblebug
CB-Tumblebug (CB-TB) is a system for managing multi-cloud infrastructure consisting of resources from multiple cloud service providers. It provides an overview, features, and architecture. The tool supports various cloud providers and resource types, with ongoing development and localization efforts. Users can deploy a multi-cloud infra with GPUs, enjoy multiple LLMs in parallel, and utilize LLM-related scripts. The tool requires Linux, Docker, Docker Compose, and Golang for building the source. Users can run CB-TB with Docker Compose or from the Makefile, set up prerequisites, contribute to the project, and view a list of contributors. The tool is licensed under an open-source license.

robusta
Robusta is a tool designed to enhance Prometheus notifications for Kubernetes environments. It offers features such as smart grouping to reduce notification spam, AI investigation for alert analysis, alert enrichment with additional data like pod logs, self-healing capabilities for defining auto-remediation rules, advanced routing options, problem detection without PromQL, change-tracking for Kubernetes resources, auto-resolve functionality, and integration with various external systems like Slack, Teams, and Jira. Users can utilize Robusta with or without Prometheus, and it can be installed alongside existing Prometheus setups or as part of an all-in-one Kubernetes observability stack.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.