omnia

An open-source toolkit for deploying and managing high performance clusters for HPC, AI, and data analytics workloads.

Stars: 238

Visit

Omnia is a deployment tool designed to turn servers with RPM-based Linux images into functioning Slurm/Kubernetes clusters. It provides an Ansible playbook-based deployment for Slurm and Kubernetes on servers running an RPM-based Linux OS. The tool simplifies the process of setting up and managing clusters, making it easier for users to deploy and maintain their infrastructure.

README:

Ansible playbook-based deployment of Slurm and Kubernetes on servers running an RPM-based Linux OS

Omnia (Latin: all or everything) is a deployment tool to turn servers with RPM-based Linux images into functioning Slurm/Kubernetes clusters.

Omnia Documentation

Omnia Documentation is hosted on Read The Docs.

Current Status:

Licensing

Omnia is made available under the Apache 2.0 license

Contributing To Omnia

We encourage everyone to help us improve Omnia by contributing to the project. Contributions can be as small as documentation updates or adding example use cases, to adding commenting and properly styling code segments all the way up to full feature contributions. We ask that contributors follow our established guidelines for contributing to the project.

Contributions to Omnia are made through Pull Requests (PRs) to "devel" branch. "devel" is the bleeding edge branch of Omnia packed with experimental and untested features".

Omnia Community Members:

Contributors

Our thanks go to everyone who makes Omnia possible (emoji key):

_{John Lockman} ⚠️ 💻 📝 🤔 🚧 🧑‍🏫 🎨 👀 📢 🐛	_{Lucas A. Wilson} 💻 🎨 🚧 🤔 📝 📖 🧑‍🏫 📆 👀 📢 🐛	_{Sujit Jadhav} 🤔 📖 💻 👀 🚧 📆 🧑‍🏫 📢 💬 ⚠️ 🐛	_{Deepika K} 💻 ⚠️ 🐛 🛡️ 📢 👀 🧑‍🏫	_{Abhishek SA} 💻 🐛 📖 ⚠️ 🚧 📢 🧑‍🏫 👀	_{Sakshi Arora} 💻 🐛 📢	_{Shubhangi Srivastava} 💻 🚧 🐛 📢
_{Cassey Goveas} 📖 🐛 🚧 📢	_{Khushboo Dholi} 💻	_{Prasoon Kumar Sinha} 🤔 📢 🧑‍🏫	_SajithDas 📆 📢	_i3igpete 💼 📢	_{renzo-granados} 🐛	_Aditya-DP 💻 🐛 🛡️
_{Katakam Rakesh Naga Sai} 💻 🐛	_{Venkateswara Vatam} 📆 📢	_{RvishankarOMnia} 🤔 📢 🧑‍🏫	_{Mike Renfro} 📖	_{Lee Reynolds} 💻 📖 ✅	_{blesson-james} 💻 ⚠️ 🐛	_abhishek-s-a 💻 📖 ⚠️
_{Subhankar-Adak} 💻	_priti-parate 💻 🐛 📢 🧑‍🏫 👀	_{Narthan S} 💻 🧑‍🏫 👀	_{preeti-thankachan} ⚠️ 🐛	_{Boris Glimcher} 💻 🚧📖	_{Moshi Binyamini} 💻🚧	_{Supriya Parthasarathy} 📆
_{Milisha Gupta} 💻	_{sakshi-singla-1735} 💻	_{Nethravathi M G} 💻 📆 📢	_{Abdul Rijwan} 🚇	_{Pranav Kumar} 💻	_{SanthoshT2001} 💻	_Kratika 💻
_{Aditi Sharma} 💻 📢	_{David Weinehall} 💻	_{Soumyadeep Basu} 📖	_VrindaMarwah 💻	_{Suman K} 💻	_{Nagachandan P} 💻	_{Prabhu Gurumurthy} ⚠️ 🐛
_Rohith-Ravut ⚠️ 💻	_jiad-vmware 🐛	_{Justin Lecher} 🤔	_Kavyabr23 💻 ⚠️	_{vedaprakashanp} ⚠️ 💻	_{Bhagyashree-shetty} ⚠️ 💻	_{Nihal Ranjan} ⚠️ 💻 📢 🐛
_jagadeeshnv 💻	_{sourabh-sahu1} 💻	_ghandoura 💻 ⚠️	_{Coleman-Trader} 💻	_{youngjae-hur7} 💻	_Grace-Chang2 💻	_{Cypher-Miller} 💻
_ptrinesh 💻	_{Ikko Ashimine} 💻	_{Lakshmi-Patneedi} 💻	_{Jie Li} 💻	_{Yong Chen} 🎨	_nvtngan 💻 🔌	_{tamilarasansubrama1} ⚠️ 💻
_shemasr 🐛 💻 ⚠️	_{Naresh Sharma} 🐛	_{Jon Hass} 📖 🎨	_{KalyanKonatham} 🐛	_{Rahul Akolkar} 🐛	_{srinandini-karumuri} 💻	_Rishabhm47 ⚠️ 💻
_vaishakh-pm ⚠️ 💻	_{shridhar-sharma} ⚠️ 💻 🐛	_Jaya.Dayyala ⚠️ 💻	_fasongan 💻	_rahuldell21 💻 ⚠️	_diptiman12 💻	_paul-tp 💻
_{VishnupriyaKrish} 💻 ⚠️	_{Ishita Datta} 📖	_Kevin-Kodama 💻	_{balajikumaran-c-s} ⚠️ 🐛 💻	_Amogha-Reddy ⚠️ 🐛 💻	_krsandeepit ⚠️ 🐛	_Yash-shetty1 ⚠️ 🐛
_Sankeerna-S 💻	_{Ajay Kadoula} 💻	_teiland7 💻 📝	_{Lavanya Adhikari} 💻	_{Franklin-Johnson} 💻 📝	_{avinashvishwanath} 📖	_{ShubhamKumar1996} 💻
_bssitton-BU 🐛	_{John Hearns} 🐛	_{William Dizon} ✅	_{kris buggenhout} 🐛	_araji 💻

For Tasks:

Click tags to check more tools for each tasks

deploy clusters manage infrastructure automate deployments configure servers maintain clusters

For Jobs:

system administrator devops engineer cloud engineer site reliability engineer cluster administrator

Alternative AI tools for omnia

Similar Open Source Tools

omnia

github

: 238

activepieces

Activepieces is an open source replacement for Zapier, designed to be extensible through a type-safe pieces framework written in Typescript. It features a user-friendly Workflow Builder with support for Branches, Loops, and Drag and Drop. Activepieces integrates with Google Sheets, OpenAI, Discord, and RSS, along with 80+ other integrations. The list of supported integrations continues to grow rapidly, thanks to valuable contributions from the community. Activepieces is an open ecosystem; all piece source code is available in the repository, and they are versioned and published directly to npmjs.com upon contributions. If you cannot find a specific piece on the pieces roadmap, please submit a request by visiting the following link: Request Piece Alternatively, if you are a developer, you can quickly build your own piece using our TypeScript framework. For guidance, please refer to the following guide: Contributor's Guide

github

: 20.8k

eventcatalog

EventCatalog is an architecture catalog for distributed systems that allows users to document events, services, domains, and flows with AI-powered discovery. It provides features such as AI-native discovery, visual documentation, multi-platform support, enterprise readiness, and customization. The tool is organized as a Turborepo monorepo with different modules for the main catalog application, Node.js SDK, and CLI scaffolding tool. EventCatalog is purpose-built for distributed systems and event-driven architectures, offering advantages over generic documentation tools, vendor-specific tools, and service catalogs.

github

: 2.5k

NeuroAI_Course

Neuromatch Academy NeuroAI Course Syllabus is a repository that contains the schedule and licensing information for the NeuroAI course. The course is designed to provide participants with a comprehensive understanding of artificial intelligence in neuroscience. It covers various topics related to AI applications in neuroscience, including machine learning, data analysis, and computational modeling. The content is primarily accessed from the ebook provided in the repository, and the course is scheduled for July 15-26, 2024. The repository is shared under a Creative Commons Attribution 4.0 International License and software elements are additionally licensed under the BSD (3-Clause) License. Contributors to the project are acknowledged and welcomed to contribute further.

github

: 60

easyaiot

EasyAIoT is an AI cloud platform designed to support camera integration, annotation, training, inference, data collection, analysis, alerts, recording, storage, and deployment. It aims to provide a zero-threshold AI experience for everyone, with a focus on cameras below a hundred levels. The platform consists of five core projects: WEB module for frontend management, DEVICE module for device management, VIDEO module for video processing, AI module for AI analysis, and TASK module for high-performance task execution. EasyAIoT combines Java, Python, and C++ to create a versatile and user-friendly AIoT platform.

github

: 51

GoMaxAI-ChatGPT-Midjourney-Pro

GoMaxAI Pro is an AI-powered application for personal, team, and enterprise private operations. It supports various models like ChatGPT, Claude, Gemini, Kimi, Wenxin Yiyuan, Xunfei Xinghuo, Tsinghua Zhipu, Suno-v3.5, and Luma-video. The Pro version offers a new UI interface, member points system, management backend, homepage features, support for various content formats, AI video capabilities, SAAS multi-opening function, bug fixes, and more. It is built using web frontend with Vue3, mobile frontend with Uniapp, management frontend with Vue3, backend with Nodejs, and uses MySQL5.7(+) + Redis for data support. It can be deployed on Linux, Windows, or MacOS, with data storage options including local storage, Aliyun OSS, Tencent Cloud COS, and Chevereto image bed.

github

: 233

KubeDoor

KubeDoor is a microservice resource management platform developed using Python and Vue, based on K8S admission control mechanism. It supports unified remote storage, monitoring, alerting, notification, and display for multiple K8S clusters. The platform focuses on resource analysis and control during daily peak hours of microservices, ensuring consistency between resource request rate and actual usage rate.

github

: 272

Awesome-Chinese-LLM

Analyze the following text from a github repository (name and readme text at end) . Then, generate a JSON object with the following keys and provide the corresponding information for each key, ,'for_jobs' (List 5 jobs suitable for this tool,in lowercase letters), 'ai_keywords' (keywords of the tool,in lowercase letters), 'for_tasks' (list of 5 specific tasks user can use this tool to do,in less than 3 words,Verb + noun form,in daily spoken language,in lowercase letters).Answer in english languagesname:Awesome-Chinese-LLM readme:# Awesome Chinese LLM ![](https://awesome.re/badge.svg) ![Awesome-Chinese-LLM](src/icon.png) An Awesome Collection for LLM in Chinese 收集和梳理中文LLM相关 ![GitHub stars](https://img.shields.io/github/stars/HqWu-HITCS/Awesome-Chinese-LLM.svg?style=popout-square) ![GitHub issues](https://img.shields.io/github/issues/HqWu-HITCS/Awesome-Chinese- LLM.svg?style=popout-square) ![GitHub forks](https://img.shields.io/github/forks/HqWu-HITCS/Awesome-Chinese- LLM.svg?style=popout-square) 自ChatGPT为代表的大语言模型（Large Language Model, LLM）出现以后，由于其惊人的类通用人工智能（AGI）的能力，掀起了新一轮自然语言处理领域的研究和应用的浪潮。尤其是以ChatGLM、LLaMA等平民玩家都能跑起来的较小规模的LLM开源之后，业界涌现了非常多基于LLM的二次微调或应用的案例。本项目旨在收集和梳理中文LLM相关的开源模型、应用、数据集及教程等资料，目前收录的资源已达100+个！如果本项目能给您带来一点点帮助，麻烦点个⭐️吧～同时也欢迎大家贡献本项目未收录的开源模型、应用、数据集等。提供新的仓库信息请发起PR，并按照本项目的格式提供仓库链接、star数，简介等相关信息，感谢~

github

: 15.0k

SwanLab

SwanLab is an open-source, lightweight AI experiment tracking tool that provides a platform for tracking, comparing, and collaborating on experiments, aiming to accelerate the research and development efficiency of AI teams by 100 times. It offers a friendly API and a beautiful interface, combining hyperparameter tracking, metric recording, online collaboration, experiment link sharing, real-time message notifications, and more. With SwanLab, researchers can document their training experiences, seamlessly communicate and collaborate with collaborators, and machine learning engineers can develop models for production faster.

github

: 3.6k

MaiBot

MaiBot is an interactive intelligent agent based on a large language model. It aims to be an 'entity' active in QQ group chats, focusing on human-like interactions. It features personification in language style, behavior planning, expression learning, plugin system for unlimited extensions, and emotion expression. The project's design philosophy emphasizes creating a 'life form' in group chats that feels real rather than perfect, with the goal of providing companionship through an AI that makes mistakes and has its own perceptions and thoughts. The code is open-source, but the runtime data of MaiBot is intended to remain closed to maintain its autonomy and conversational nature.

github

: 4.2k

BiBi-Keyboard

BiBi-Keyboard is an AI-based intelligent voice input method that aims to make voice input more natural and efficient. It provides features such as voice recognition with simple and intuitive operations, multiple ASR engine support, AI text post-processing, floating ball input for cross-input method usage, AI editing panel with rich editing tools, Material3 design for modern interface style, and support for multiple languages. Users can adjust keyboard height, test input directly in the settings page, view recognition word count statistics, receive vibration feedback, and check for updates automatically. The tool requires Android 10.0 or higher, microphone permission for voice recognition, optional overlay permission for the floating ball feature, and optional accessibility permission for automatic text insertion.

github

: 502

LLMOne

LLMOne is an open-source, lightweight enterprise-level platform for deploying and serving large language models. It aims to address pain points in traditional large model private deployment such as long cycles, complex configurations, performance challenges, and high operational costs. LLMOne simplifies the deployment process with highly automated workflows and optimized runtime environments, ensuring enterprise-level performance and stability. It caters to developers, manufacturers, and users of large language models, providing features like rapid deployment, professional inference performance, broad compatibility with AI hardware, flexible model and application management, visual operational monitoring, and an open application ecosystem.

github

: 82

Saber-Translator

Saber-Translator is your exclusive AI comic translation tool, designed to effortlessly eliminate language barriers and enjoy the original comic fun. It offers features like translating comic images/PDFs, intelligent bubble detection and text recognition, powerful AI translation engine with multiple service providers, highly customizable translation effects, real-time preview and convenient operations, efficient image management and download, model recording and recommendation, and support for language learning with dual prompt word outputs.

github

: 2.7k

matrixone

MatrixOne is the industry's first database to bring Git-style version control to data, combined with MySQL compatibility, AI-native capabilities, and cloud-native architecture. It is a HTAP (Hybrid Transactional/Analytical Processing) database with a hyper-converged HSTAP engine that seamlessly handles transactional, analytical, full-text search, and vector search workloads in a single unified system—no data movement, no ETL, no compromises. Manage your database like code with features like instant snapshots, time travel, branch & merge, instant rollback, and complete audit trail. Built for the AI era, MatrixOne is MySQL-compatible, AI-native, and cloud-native, offering storage-compute separation, elastic scaling, and Kubernetes-native deployment. It serves as one database for everything, replacing multiple databases and ETL jobs with native OLTP, OLAP, full-text search, and vector search capabilities.

github

: 1.9k

Operit

Operit AI is a fully functional AI assistant application for mobile devices, running independently on Android devices with powerful tool invocation capabilities. It offers over 40 built-in tools for file system operations, HTTP requests, system operations, UI automation, and media processing. The app combines these tools with rich plugins to enable a wide range of tasks, from simple to complex, providing a comprehensive experience of a smartphone AI assistant.

github

: 3.3k

md_design

Nakidka is a tool for 1C:Enterprise 8 that allows for quick creation of forms based on text descriptions. It uses a simple and understandable syntax similar to Markdown, and also supports visual design of interface elements.

github

: 136

For similar tasks

sfdx-hardis

sfdx-hardis is a toolbox for Salesforce DX, developed by Cloudity, that simplifies tasks which would otherwise take minutes or hours to complete manually. It enables users to define complete CI/CD pipelines for Salesforce projects, backup metadata, and monitor any Salesforce org. The tool offers a wide range of commands that can be accessed via the command line interface or through a Visual Studio Code extension. Additionally, sfdx-hardis provides Docker images for easy integration into CI workflows. The tool is designed to be natively compliant with various platforms and tools, making it a versatile solution for Salesforce developers.

github

: 281

omnia

github

: 238

ChatOpsLLM

ChatOpsLLM is a project designed to empower chatbots with effortless DevOps capabilities. It provides an intuitive interface and streamlined workflows for managing and scaling language models. The project incorporates robust MLOps practices, including CI/CD pipelines with Jenkins and Ansible, monitoring with Prometheus and Grafana, and centralized logging with the ELK stack. Developers can find detailed documentation and instructions on the project's website.

github

: 87

voicechat2

Voicechat2 is a fast, fully local AI voice chat tool that uses WebSockets for communication. It includes a WebSocket server for remote access, default web UI with VAD and Opus support, and modular/swappable SRT, LLM, TTS servers. Users can customize components like SRT, LLM, and TTS servers, and run different models for voice-to-voice communication. The tool aims to reduce latency in voice communication and provides flexibility in server configurations.

github

: 500

open-edison

OpenEdison is a secure MCP control panel that connects AI to data/software with additional security controls to reduce data exfiltration risks. It helps address the lethal trifecta problem by providing visibility, monitoring potential threats, and alerting on data interactions. The tool offers features like data leak monitoring, controlled execution, easy configuration, visibility into agent interactions, a simple API, and Docker support. It integrates with LangGraph, LangChain, and plain Python agents for observability and policy enforcement. OpenEdison helps gain observability, control, and policy enforcement for AI interactions with systems of records, existing company software, and data to reduce risks of AI-caused data leakage.

github

: 187

ai-factory

AI Factory is a CLI tool and skill system that streamlines AI-powered development by handling context setup, skill installation, and workflow configuration. It supports multiple AI coding agents, offers spec-driven development, and integrates with popular tech stacks like Next.js, Laravel, Django, and Express. The tool ensures zero configuration, best practices adherence, community skills utilization, and multi-agent support. Users can create plans, tasks, and commits for structured feature development, bug fixes, and self-improvement. Security is a priority with mandatory two-level scans for external skills. The tool's learning loop generates patches from bug fixes to enhance future implementations.

github

: 145

fastapi-admin

智元 Fast API is a one-stop API management system that unifies various LLM APIs in terms of format, standards, and management to achieve the ultimate in functionality, performance, and user experience. It includes features such as model management with intelligent and regex matching, backup model functionality, key management, proxy management, company management, user management, and chat management for both admin and user ends. The project supports cluster deployment, multi-site deployment, and cross-region deployment. It also provides a public API site for registration with a contact to the author for a 10 million quota. The tool offers a comprehensive dashboard, model management, application management, key management, and chat management functionalities for users.

github

: 114

awsome-distributed-training

This repository contains reference architectures and test cases for distributed model training with Amazon SageMaker Hyperpod, AWS ParallelCluster, AWS Batch, and Amazon EKS. The test cases cover different types and sizes of models as well as different frameworks and parallel optimizations (Pytorch DDP/FSDP, MegatronLM, NemoMegatron...).

github

: 230

For similar jobs

AirGo

AirGo is a front and rear end separation, multi user, multi protocol proxy service management system, simple and easy to use. It supports vless, vmess, shadowsocks, and hysteria2.

github

: 378

mosec

Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API. * **Highly performant** : web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O * **Ease of use** : user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing * **Dynamic batching** : aggregate requests from different users for batched inference and distribute results back * **Pipelined stages** : spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads * **Cloud friendly** : designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems * **Do one thing well** : focus on the online serving part, users can pay attention to the model optimization and business logic

github

: 834

llm-code-interpreter

The 'llm-code-interpreter' repository is a deprecated plugin that provides a code interpreter on steroids for ChatGPT by E2B. It gives ChatGPT access to a sandboxed cloud environment with capabilities like running any code, accessing Linux OS, installing programs, using filesystem, running processes, and accessing the internet. The plugin exposes commands to run shell commands, read files, and write files, enabling various possibilities such as running different languages, installing programs, starting servers, deploying websites, and more. It is powered by the E2B API and is designed for agents to freely experiment within a sandboxed environment.

github

: 465

pezzo

Pezzo is a fully cloud-native and open-source LLMOps platform that allows users to observe and monitor AI operations, troubleshoot issues, save costs and latency, collaborate, manage prompts, and deliver AI changes instantly. It supports various clients for prompt management, observability, and caching. Users can run the full Pezzo stack locally using Docker Compose, with prerequisites including Node.js 18+, Docker, and a GraphQL Language Feature Support VSCode Extension. Contributions are welcome, and the source code is available under the Apache 2.0 License.

github

: 2.3k

learn-generative-ai

Learn Cloud Applied Generative AI Engineering (GenEng) is a course focusing on the application of generative AI technologies in various industries. The course covers topics such as the economic impact of generative AI, the role of developers in adopting and integrating generative AI technologies, and the future trends in generative AI. Students will learn about tools like OpenAI API, LangChain, and Pinecone, and how to build and deploy Large Language Models (LLMs) for different applications. The course also explores the convergence of generative AI with Web 3.0 and its potential implications for decentralized intelligence.

github

: 592

gcloud-aio

This repository contains shared codebase for two projects: gcloud-aio and gcloud-rest. gcloud-aio is built for Python 3's asyncio, while gcloud-rest is a threadsafe requests-based implementation. It provides clients for Google Cloud services like Auth, BigQuery, Datastore, KMS, PubSub, Storage, and Task Queue. Users can install the library using pip and refer to the documentation for usage details. Developers can contribute to the project by following the contribution guide.

github

: 324

fluid

Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intensive applications, such as big data and AI applications. It implements dataset abstraction, scalable cache runtime, automated data operations, elasticity and scheduling, and is runtime platform agnostic. Key concepts include Dataset and Runtime. Prerequisites include Kubernetes version > 1.16, Golang 1.18+, and Helm 3. The tool offers features like accelerating remote file accessing, machine learning, accelerating PVC, preloading dataset, and on-the-fly dataset cache scaling. Contributions are welcomed, and the project is under the Apache 2.0 license with a vendor-neutral approach.

github

: 1.8k

aiges

AIGES is a core component of the Athena Serving Framework, designed as a universal encapsulation tool for AI developers to deploy AI algorithm models and engines quickly. By integrating AIGES, you can deploy AI algorithm models and engines rapidly and host them on the Athena Serving Framework, utilizing supporting auxiliary systems for networking, distribution strategies, data processing, etc. The Athena Serving Framework aims to accelerate the cloud service of AI algorithm models and engines, providing multiple guarantees for cloud service stability through cloud-native architecture. You can efficiently and securely deploy, upgrade, scale, operate, and monitor models and engines without focusing on underlying infrastructure and service-related development, governance, and operations.

github

: 275