netdata

The fastest path to AI-powered full stack observability, even for lean teams.

Stars: 76217

Visit

Netdata is an open-source, real-time infrastructure monitoring platform that provides instant insights, zero configuration deployment, ML-powered anomaly detection, efficient monitoring with minimal resource usage, and secure & distributed data storage. It offers real-time, per-second updates and clear insights at a glance. Netdata's origin story involves addressing the limitations of existing monitoring tools and led to a fundamental shift in infrastructure monitoring. It is recognized as the most energy-efficient tool for monitoring Docker-based systems according to a study by the University of Amsterdam.

README:

X-Ray Vision for your infrastructure!

Every Metric, Every Second. No BS.

Visit our Home Page

[!WARNING] People get addicted to Netdata. Once you use it on your systems, there's no going back.

WHO WE ARE

Netdata is an open-source, real-time infrastructure monitoring platform. Monitor, detect, and act across your entire infrastructure.

Core Advantages:

Instant Insights – With Netdata you can access per-second metrics and visualizations.
Zero Configuration – You can deploy immediately without complex setup.
ML-Powered – You can detect anomalies, predict issues, and automate analysis.
Efficient – You can monitor with minimal resource usage and maximum scalability.
Secure & Distributed – You can keep your data local with no central collection needed.

With Netdata, you get real-time, per-second updates. Clear insights at a glance, no complexity.

All heroes have a great origin story. Click to discover ours.

In 2013, at the company where Costa Tsaousis was COO, a significant percentage of their cloud-based transactions failed silently, severely impacting business performance.

Costa and his team tried every troubleshooting tool available at the time. None could identify the root cause. As Costa later wrote:

“I couldn’t believe that monitoring systems provide so few metrics and with such low resolution, scale so badly, and cost so much to run.”

Frustrated, he decided to build his own monitoring tool, starting from scratch.

That decision led to countless late nights and weekends. It also sparked a fundamental shift in how infrastructure monitoring and troubleshooting are approached, both in method and in cost.

Most Energy-Efficient Monitoring Tool

According to the University of Amsterdam study, Netdata is the most energy-efficient tool for monitoring Docker-based systems. The study also shows Netdata excels in CPU usage, RAM usage, and execution time compared to other monitoring solutions.

Key Features

Feature	Description	What Makes It Unique
Real-Time	Per-second data collection and processing	Works in a beat – click and see results instantly
Zero-Configuration	Automatic detection and discovery	Auto-discovers everything on the nodes it runs
ML-Powered	Unsupervised anomaly detection	Trains multiple ML models per metric at the edge
Long-Term Retention	High-performance storage	~0.5 bytes per sample with tiered storage for archiving
Advanced Visualization	Rich, interactive dashboards	Slice and dice data without query language
Extreme Scalability	Native horizontal scaling	Parent-Child centralization with multi-million samples/s
Complete Visibility	From infrastructure to applications	Simplifies operations and eliminates silos
Edge-Based	Processing at your premises	Distributes code instead of centralizing data

[!NOTE]
Want to put Netdata to the test against Prometheus? Explore the full comparison.

Netdata Ecosystem

This three-part architecture enables you to scale from single nodes to complex multi-cloud environments:

Component	Description	License
Netdata Agent	• Core monitoring engine • Handles collection, storage, ML, alerts, exports • Runs on servers, cloud, K8s, IoT • Zero production impact	GPL v3+
Netdata Cloud	• Enterprise features • User management, RBAC, horizontal scaling • Centralized alerts • Free community tier • No metric storage centralization
Netdata UI	• Dashboards and visualizations • Free to use • Included in standard packages • Latest version via CDN	NCUL1

What You Can Monitor

With Netdata you can monitor all these components across platforms:

Component	Linux	FreeBSD	macOS	Windows
System Resources CPU, Memory and system shared resources	Full	Yes	Yes	Yes
Storage Disks, Mount points, Filesystems, RAID arrays	Full	Yes	Yes	Yes
Network Network Interfaces, Protocols, Firewall, etc	Full	Yes	Yes	Yes
Hardware & Sensors Fans, Temperatures, Controllers, GPUs, etc	Full	Some	Some	Some
O/S Services Resources, Performance and Status	Yes `systemd`	-	-	-
Processes Resources, Performance, OOM, and more	Yes	Yes	Yes	Yes
System and Application Logs	Yes `systemd`-journal	-	-	Yes `Windows Event Log`, `ETW`
Network Connections Live TCP and UDP sockets per PID	Yes	-	-	-
Containers Docker/containerd, LXC/LXD, Kubernetes, etc	Yes	-	-	-
VMs (from the host) KVM, qemu, libvirt, Proxmox, etc	Yes `cgroups`	-	-	Yes `Hyper-V`
Synthetic Checks Test APIs, TCP ports, Ping, Certificates, etc	Yes	Yes	Yes	Yes
Packaged Applications nginx, apache, postgres, redis, mongodb, and hundreds more	Yes	Yes	Yes	Yes
Cloud Provider Infrastructure AWS, GCP, Azure, and more	Yes	Yes	Yes	Yes
Custom Applications OpenMetrics, StatsD and soon OpenTelemetry	Yes	Yes	Yes	Yes

On Linux, you can continuously monitor all kernel features and hardware sensors for errors, including Intel/AMD/Nvidia GPUs, PCI AER, RAM EDAC, IPMI, S.M.A.R.T, Intel RAPL, NVMe, fans, power supplies, and voltage readings.

Getting Started

You can install Netdata on all major operating systems. To begin:

1. Install Netdata

Choose your platform and follow the installation guide:

[!NOTE] You can access the Netdata UI at http://localhost:19999 (or http://NODE:19999 if remote).

2. Configure Collectors

Netdata auto-discovers most metrics, but you can manually configure some collectors:

3. Configure Alerts

You can use hundreds of built-in alerts and integrate with:

email, Slack, Telegram, PagerDuty, Discord, Microsoft Teams, and more.

[!NOTE]
Email alerts work by default if there's a configured MTA.

4. Configure Parents

You can centralize dashboards, alerts, and storage with Netdata Parents:

Streaming Reference

[!NOTE]
You can use Netdata Parents for central dashboards, longer retention, and alert configuration.

5. Connect to Netdata Cloud

Access from anywhere
Horizontal scalability and multi-node dashboards
UI configuration for alerts and data collection
Role-based access control
Free tier available

[!NOTE]
Netdata Cloud is optional. Your data stays in your infrastructure.

Live Demo Sites

How It Works

With Netdata you can run a modular pipeline for metrics collection, processing, and visualization.

flowchart TB
  A[Netdata Agent]:::mainNode
  A1(Collect):::green --> A
  A2(Store):::green --> A
  A3(Learn):::green --> A
  A4(Detect):::green --> A
  A5(Check):::green --> A
  A6(Stream):::green --> A
  A7(Archive):::green --> A
  A8(Query):::green --> A
  A9(Score):::green --> A

  classDef green fill:#bbf3bb,stroke:#333,stroke-width:1px,color:#000
  classDef mainNode fill:#f0f0f0,stroke:#333,stroke-width:1px,color:#333

With each Agent you can:

Collect – Gather metrics from systems, containers, apps, logs, APIs, and synthetic checks.
Store – Save metrics to a high-efficiency, tiered time-series database.
Learn – Train ML models per metric using recent behavior.
Detect – Identify anomalies using trained ML models.
Check – Evaluate metrics against pre-set or custom alert rules.
Stream – Send metrics to Netdata Parents in real time.
Archive – Export metrics to Prometheus, InfluxDB, OpenTSDB, Graphite, and others.
Query – Access metrics via an API for dashboards or third-party tools.
Score – Use a scoring engine to find patterns and correlations across metrics.

[!NOTE]
Learn more: Netdata's architecture

Agent Capabilities

With the Netdata Agent, you can use these core capabilities out-of-the-box:

Capability	Description
Comprehensive Collection	• 800+ integrations • Systems, containers, VMs, hardware sensors • OpenMetrics, StatsD, and logs • OpenTelemetry support coming soon
Performance & Precision	• Per-second collection • Real-time visualization with 1-second latency • High-resolution metrics
Edge-Based ML	• ML models trained at the edge • Automatic anomaly detection per metric • Pattern recognition based on historical behavior
Advanced Log Management	• Direct systemd-journald and Windows Event Log integration • Process logs at the edge • Rich log visualization
Observability Pipeline	• Parent-Child relationships • Flexible centralization • Multi-level replication and retention
Automated Visualization	• NIDL data model • Auto-generated dashboards • No query language needed
Smart Alerting	• Pre-configured alerts • Multiple notification methods • Proactive detection
Low Maintenance	• Auto-detection • Zero-touch ML • Easy scalability • CI/CD friendly
Open & Extensible	• Modular architecture • Easy to customize • Integrates with existing tools

CNCF Membership

Netdata actively supports and is a member of the Cloud Native Computing Foundation (CNCF).
It is one of the most starred projects in the CNCF landscape.

FAQ

Is Netdata secure?

Yes. Netdata follows OpenSSF best practices, has a security-first design, and is regularly audited by the community.

Does Netdata use a lot of resources?

No. Even with ML and per-second metrics, Netdata uses minimal resources.

~5% CPU and 150MiB RAM by default on production systems
<1% CPU and ~100MiB RAM when ML and alerts are disabled and using ephemeral storage
Parents scale to millions of metrics per second with appropriate hardware

You can use the Netdata Monitoring section in the dashboard to inspect its resource usage.

How much data retention is possible?

As much as your disk allows.

With Netdata you can use tiered retention:

Tier 0: per-second resolution
Tier 1: per-minute resolution
Tier 2: per-hour resolution

These are queried automatically based on the zoom level.

Can Netdata scale to many servers?

Yes. With Netdata you can:

Scale horizontally with many Agents
Scale vertically with powerful Parents
Scale infinitely via Netdata Cloud

You can use Netdata Cloud to merge many independent infrastructures into one logical view.

Is disk I/O a concern?

No. Netdata minimizes disk usage:

Metrics are flushed to disk every 17 minutes, spread out evenly
Uses direct I/O and compression (ZSTD)
Can run entirely in RAM or stream to a Parent

You can use alloc or ram mode for no disk writes.

How is Netdata different from Prometheus + Grafana?

With Netdata you get a complete monitoring solution—not just tools.

No manual setup or dashboards needed
Built-in ML, alerts, dashboards, and correlations
More efficient and easier to deploy

Performance comparison

How is Netdata different from commercial SaaS tools?

With Netdata you can store all metrics on your infrastructure—no sampling, no aggregation, no loss.

High-resolution metrics by default
ML per metric, not shared models
Unlimited scalability without skyrocketing cost

Can Netdata run alongside Nagios, Zabbix, etc.?

Yes. You can use Netdata together with traditional tools.

With Netdata you get:

Real-time, high-resolution monitoring
Zero configuration and auto-generated dashboards
Anomaly detection and advanced visualization

What if I feel overwhelmed?

You can start small:

Use the dashboard's table of contents and search
Explore anomaly scoring ("AR" toggle)
Create custom dashboards in Netdata Cloud

Docs and guides

Do I have to use Netdata Cloud?

No. Netdata Cloud is optional.

Netdata works without it, but with Cloud you can:

Access remotely with SSO
Save dashboard customizations
Configure alerts centrally
Collaborate with role-based access

What telemetry does Netdata collect?

Anonymous telemetry helps improve the product. You can disable it:

Add --disable-telemetry to the installer, or
Create /etc/netdata/.opt-out-from-anonymous-statistics and restart Netdata

Telemetry helps us understand usage, not track users. No private data is collected.

Who uses Netdata?

You'll join users including:

Major companies (Amazon, ABN AMRO Bank, Facebook, Google, IBM, Intel, Netflix, Samsung)
Universities (NYU, Columbia, Seoul National, UCL)
Government organizations worldwide
Infrastructure-intensive organizations
Technology operators
Startups and freelancers
SysAdmins and DevOps professionals

📖 Documentation

Visit Netdata Learn for full documentation and guides.

[!NOTE]
Includes deployment, configuration, alerting, exporting, troubleshooting, and more.

🎉 Community

Join the Netdata community:

[!NOTE]
Code of Conduct

🙏 Contribute

We welcome your contributions.

Ways you help us stay sharp:

Share best practices and monitoring insights
Report issues or missing features
Improve documentation
Develop new integrations or collectors
Help users in forums and chats

[!NOTE]
Contribution guide

📜 License

The Netdata ecosystem includes:

Netdata Agent – Open-source core (GPLv3+). Includes data collection, storage, ML, alerting, APIs and redistributes several other open-source tools and libraries.
- Netdata Agent License
- Netdata Agent Redistributed
Netdata UI – Closed-source but free to use with Netdata Agent and Cloud. Delivered via CDN. It integrates third-party open-source components.
- Netdata Cloud UI License
- Netdata UI third-party licenses
Netdata Cloud – Closed-source, with free and paid tiers. Adds remote access, SSO, scalability.

For Tasks:

Click tags to check more tools for each tasks

monitor infrastructure detect anomalies automate analysis visualize data configure alerts

For Jobs:

system administrator devops engineer site reliability engineer network engineer cloud infrastructure engineer

Alternative AI tools for netdata

Similar Open Source Tools

netdata

github

: 76.2k

neuropilot

NeuroPilot is an open-source AI-powered education platform that transforms study materials into interactive learning resources. It provides tools like contextual chat, smart notes, flashcards, quizzes, and AI podcasts. Supported by various AI models and embedding providers, it offers features like WebSocket streaming, JSON or vector database support, file-based storage, and configurable multi-provider setup for LLMs and TTS engines. The technology stack includes Node.js, TypeScript, Vite, React, TailwindCSS, JSON database, multiple LLM providers, and Docker for deployment. Users can contribute to the project by integrating AI models, adding mobile app support, improving performance, enhancing accessibility features, and creating documentation and tutorials.

github

: 108

Starmoon

Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.

github

: 457

Automodel

Automodel is a Python library for automating the process of building and evaluating machine learning models. It provides a set of tools and utilities to streamline the model development workflow, from data preprocessing to model selection and evaluation. With Automodel, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to find the best model for their dataset. The library is designed to be user-friendly and customizable, allowing users to define their own pipelines and workflows. Automodel is suitable for data scientists, machine learning engineers, and anyone looking to quickly build and test machine learning models without the need for manual intervention.

github

: 66

EpicStaff

EpicStaff is a powerful project management tool designed to streamline team collaboration and task management. It provides a user-friendly interface for creating and assigning tasks, tracking progress, and communicating with team members in real-time. With features such as task prioritization, deadline reminders, and file sharing capabilities, EpicStaff helps teams stay organized and productive. Whether you're working on a small project or managing a large team, EpicStaff is the perfect solution to keep everyone on the same page and ensure project success.

github

: 58

MaixPy

MaixPy is a Python SDK that enables users to easily create AI vision projects on edge devices. It provides a user-friendly API for accessing NPU, making it suitable for AI Algorithm Engineers, STEM teachers, Makers, Engineers, Students, Enterprises, and Contestants. The tool supports Python programming, MaixVision Workstation, AI vision, video streaming, voice recognition, and peripheral usage. It also offers an online AI training platform called MaixHub. MaixPy is designed for new hardware platforms like MaixCAM, offering improved performance and features compared to older versions. The ecosystem includes hardware, software, tools, documentation, and a cloud platform.

github

: 219

sdnext

SD.Next is an Image Diffusion implementation with advanced features. It offers multiple UI options, diffusion models, and built-in controls for text, image, batch, and video processing. The tool is multiplatform, supporting Windows, Linux, MacOS, nVidia, AMD, IntelArc/IPEX, DirectML, OpenVINO, ONNX+Olive, and ZLUDA. It provides optimized processing with the latest torch developments, including model compile, quantize, and compress functionalities. SD.Next also features Interrogate/Captioning with various models, queue management, automatic updates, and mobile compatibility.

github

: 6.6k

code-a2z

Code A2Z - Project Blog is a collaborative platform for developers and writers to create, manage, and share content. It offers structured environment, role-based access, SEO optimization, and community discussions to enhance collaboration and global visibility. Users can contribute projects, update them, and improve the platform. Key features include Markdown support, submodule integration, customizable templates, project contribution workflow, global visibility, community discussions, full ownership, SEO optimization, and role-based dashboard.

github

: 138

LynxHub

LynxHub is a platform that allows users to seamlessly install, configure, launch, and manage all their AI interfaces from a single, intuitive dashboard. It offers features like AI interface management, arguments manager, custom run commands, pre-launch actions, extension management, in-app tools like terminal and web browser, AI information dashboard, Discord integration, and additional features like theme options and favorite interface pinning. The platform supports modular design for custom AI modules and upcoming extensions system for complete customization. LynxHub aims to streamline AI workflow and enhance user experience with a user-friendly interface and comprehensive functionalities.

github

: 362

RisuAI

RisuAI, or Risu for short, is a cross-platform AI chatting software/web application with powerful features such as multiple API support, assets in the chat, regex functions, and much more.

github

: 945

sktime

sktime is a Python library for time series analysis that provides a unified interface for various time series learning tasks such as classification, regression, clustering, annotation, and forecasting. It offers time series algorithms and tools compatible with scikit-learn for building, tuning, and validating time series models. sktime aims to enhance the interoperability and usability of the time series analysis ecosystem by empowering users to apply algorithms across different tasks and providing interfaces to related libraries like scikit-learn, statsmodels, tsfresh, PyOD, and fbprophet.

github

: 9.3k

Streamline-Analyst

Streamline Analyst is a cutting-edge, open-source application powered by Large Language Models (LLMs) designed to revolutionize data analysis. This Data Analysis Agent effortlessly automates tasks such as data cleaning, preprocessing, and complex operations like identifying target objects, partitioning test sets, and selecting the best-fit models based on your data. With Streamline Analyst, results visualization and evaluation become seamless. It aims to expedite the data analysis process, making it accessible to all, regardless of their expertise in data analysis. The tool is built to empower users to process data and achieve high-quality visualizations with unparalleled efficiency, and to execute high-performance modeling with the best strategies. Future enhancements include Natural Language Processing (NLP), neural networks, and object detection utilizing YOLO, broadening its capabilities to meet diverse data analysis needs.

github

: 301

spaCy

spaCy is an industrial-strength Natural Language Processing (NLP) library in Python and Cython. It incorporates the latest research and is designed for real-world applications. The library offers pretrained pipelines supporting 70+ languages, with advanced neural network models for tasks such as tagging, parsing, named entity recognition, and text classification. It also facilitates multi-task learning with pretrained transformers like BERT, along with a production-ready training system and streamlined model packaging, deployment, and workflow management. spaCy is commercial open-source software released under the MIT license.

github

: 30.7k

llm4s

LLM4S provides a simple, robust, and scalable framework for building Large Language Models (LLM) applications in Scala. It aims to leverage Scala's type safety, functional programming, JVM ecosystem, concurrency, and performance advantages to create reliable and maintainable AI-powered applications. The framework supports multi-provider integration, execution environments, error handling, Model Context Protocol (MCP) support, agent frameworks, multimodal generation, and Retrieval-Augmented Generation (RAG) workflows. It also offers observability features like detailed trace logging, monitoring, and analytics for debugging and performance insights.

github

: 135

ClaudeSync

ClaudeSync is a powerful tool designed to seamlessly synchronize local files with Claude.ai projects. It bridges the gap between local development environment and Claude.ai's knowledge base, offering real-time synchronization, CLI for easy management, support for multiple organizations and projects, intelligent file filtering, configurable sync interval, two-way synchronization, and more. It ensures data privacy, open source transparency, and comes with disclaimers for use at own risk. Users can quickly start syncing by installing, logging in, selecting organization and project, and running sync. Advanced features include API, organization, project, file, chat management, configuration, synchronization modes, scheduled sync, providers, custom ignore file, and troubleshooting. Contributions are welcome, and communication channels include GitHub Issues and Discord. Licensed under MIT License.

github

: 407

project-blog

Welcome to the Blog Script Project, a collaborative platform for developers and writers to create, manage, and share content. With features like Markdown support, submodule integration, customizable templates, project contribution workflow, global visibility, community discussions, SEO optimization, and role-based dashboard, Blog Script enhances collaboration and visibility for your work. You can contribute by adding new projects, improving existing projects, updating documentation, fixing bugs, optimizing, and ensuring code readability. Follow the contribution guidelines to star the repository, find tasks, fork the repository, make changes, add screenshots, submit a pull request, and contribute to the open-source community. Additionally, you can add your project as a submodule by following the provided guidelines. Join us, contribute, and grow together!

github

: 112

For similar tasks

netdata

github

: 76.2k

AirdropsBot2024

AirdropsBot2024 is an efficient and secure solution for automated trading and sniping of coins on the Solana blockchain. It supports multiple chain networks such as Solana, BTC, and Ethereum. The bot utilizes premium APIs and Chromedriver to automate trading operations through web interfaces of popular exchanges. It offers high-speed data analysis, in-depth market analysis, support for major exchanges, complete security and control, data visualization, advanced notification options, flexibility and adaptability in trading strategies, and profile management.

github

: 404

AirdropsBot2024

github

: 375

AirdropsBot2024

github

: 94

qdrant

Qdrant is a vector similarity search engine and vector database. It is written in Rust, which makes it fast and reliable even under high load. Qdrant can be used for a variety of applications, including: * Semantic search * Image search * Product recommendations * Chatbots * Anomaly detection Qdrant offers a variety of features, including: * Payload storage and filtering * Hybrid search with sparse vectors * Vector quantization and on-disk storage * Distributed deployment * Highlighted features such as query planning, payload indexes, SIMD hardware acceleration, async I/O, and write-ahead logging Qdrant is available as a fully managed cloud service or as an open-source software that can be deployed on-premises.

github

: 26.2k

SynapseML

SynapseML (previously known as MMLSpark) is an open-source library that simplifies the creation of massively scalable machine learning (ML) pipelines. It provides simple, composable, and distributed APIs for various machine learning tasks such as text analytics, vision, anomaly detection, and more. Built on Apache Spark, SynapseML allows seamless integration of models into existing workflows. It supports training and evaluation on single-node, multi-node, and resizable clusters, enabling scalability without resource wastage. Compatible with Python, R, Scala, Java, and .NET, SynapseML abstracts over different data sources for easy experimentation. Requires Scala 2.12, Spark 3.4+, and Python 3.8+.

github

: 5.0k

mlx-vlm

MLX-VLM is a package designed for running Vision LLMs on Mac systems using MLX. It provides a convenient way to install and utilize the package for processing large language models related to vision tasks. The tool simplifies the process of running LLMs on Mac computers, offering a seamless experience for users interested in leveraging MLX for vision-related projects.

github

: 1.6k

Java-AI-Book-Code

The Java-AI-Book-Code repository contains code examples for the 2020 edition of 'Practical Artificial Intelligence With Java'. It is a comprehensive update of the previous 2013 edition, featuring new content on deep learning, knowledge graphs, anomaly detection, linked data, genetic algorithms, search algorithms, and more. The repository serves as a valuable resource for Java developers interested in AI applications and provides practical implementations of various AI techniques and algorithms.

github

: 244

For similar jobs

flux-aio

Flux All-In-One is a lightweight distribution optimized for running the GitOps Toolkit controllers as a single deployable unit on Kubernetes clusters. It is designed for bare clusters, edge clusters, clusters with restricted communication, clusters with egress via proxies, and serverless clusters. The distribution follows semver versioning and provides documentation for specifications, installation, upgrade, OCI sync configuration, Git sync configuration, and multi-tenancy configuration. Users can deploy Flux using Timoni CLI and a Timoni Bundle file, fine-tune installation options, sync from public Git repositories, bootstrap repositories, and uninstall Flux without affecting reconciled workloads.

github

: 111

paddler

Paddler is an open-source load balancer and reverse proxy designed specifically for optimizing servers running llama.cpp. It overcomes typical load balancing challenges by maintaining a stateful load balancer that is aware of each server's available slots, ensuring efficient request distribution. Paddler also supports dynamic addition or removal of servers, enabling integration with autoscaling tools.

github

: 715

DaoCloud-docs

DaoCloud Enterprise 5.0 Documentation provides detailed information on using DaoCloud, a Certified Kubernetes Service Provider. The documentation covers current and legacy versions, workflow control using GitOps, and instructions for opening a PR and previewing changes locally. It also includes naming conventions, writing tips, references, and acknowledgments to contributors. Users can find guidelines on writing, contributing, and translating pages, along with using tools like MkDocs, Docker, and Poetry for managing the documentation.

github

: 201

ztncui-aio

This repository contains a Docker image with ZeroTier One and ztncui to set up a standalone ZeroTier network controller with a web user interface. It provides features like Golang auto-mkworld for generating a planet file, supports local persistent storage configuration, and includes a public file server. Users can build the Docker image, set up the container with specific environment variables, and manage the ZeroTier network controller through the web interface.

github

: 166

devops-gpt

DevOpsGPT is a revolutionary tool designed to streamline your workflow and empower you to build systems and automate tasks with ease. Tired of spending hours on repetitive DevOps tasks? DevOpsGPT is here to help! Whether you're setting up infrastructure, speeding up deployments, or tackling any other DevOps challenge, our app can make your life easier and more productive. With DevOpsGPT, you can expect faster task completion, simplified workflows, and increased efficiency. Ready to experience the DevOpsGPT difference? Visit our website, sign in or create an account, start exploring the features, and share your feedback to help us improve. DevOpsGPT will become an essential tool in your DevOps toolkit.

github

: 52

ChatOpsLLM

ChatOpsLLM is a project designed to empower chatbots with effortless DevOps capabilities. It provides an intuitive interface and streamlined workflows for managing and scaling language models. The project incorporates robust MLOps practices, including CI/CD pipelines with Jenkins and Ansible, monitoring with Prometheus and Grafana, and centralized logging with the ELK stack. Developers can find detailed documentation and instructions on the project's website.

github

: 87

aiops-modules

AIOps Modules is a collection of reusable Infrastructure as Code (IAC) modules that work with SeedFarmer CLI. The modules are decoupled and can be aggregated using GitOps principles to achieve desired use cases, removing heavy lifting for end users. They must be generic for reuse in Machine Learning and Foundation Model Operations domain, adhering to SeedFarmer Guide structure. The repository includes deployment steps, project manifests, and various modules for SageMaker, Mlflow, FMOps/LLMOps, MWAA, Step Functions, EKS, and example use cases. It also supports Industry Data Framework (IDF) and Autonomous Driving Data Framework (ADDF) Modules.

github

: 72

3FS

The Fire-Flyer File System (3FS) is a high-performance distributed file system designed for AI training and inference workloads. It leverages modern SSDs and RDMA networks to provide a shared storage layer that simplifies development of distributed applications. Key features include performance, disaggregated architecture, strong consistency, file interfaces, data preparation, dataloaders, checkpointing, and KVCache for inference. The system is well-documented with design notes, setup guide, USRBIO API reference, and P specifications. Performance metrics include peak throughput, GraySort benchmark results, and KVCache optimization. The source code is available on GitHub for cloning and installation of dependencies. Users can build 3FS and run test clusters following the provided instructions. Issues can be reported on the GitHub repository.

github

: 8.2k