Best AI tools for< Sre Engineer >

Infographic

5 - AI tool Sites

BigPanda

BigPanda is an AI-powered ITOps platform that helps businesses automatically identify actionable alerts, proactively prevent incidents, and ensure service availability. It uses advanced AI/ML algorithms to analyze large volumes of data from various sources, including monitoring tools, event logs, and ticketing systems. BigPanda's platform provides a unified view of IT operations, enabling teams to quickly identify and resolve issues before they impact business-critical services.

site

: 72.3k

Parity

Parity is the world's first AI SRE tool designed to assist on-call engineers working with Kubernetes. It acts as the first line of defense by conducting investigations, determining root causes, and suggesting remediation before the engineer even opens their laptop. With features like Root Cause Analysis in Seconds, Intelligent Runbook Execution, and the ability to chat directly with the cluster, Parity streamlines incident response and enhances operational efficiency.

site

: 0

Developer Roadmaps

Developer Roadmaps (roadmap.sh) is a community-driven platform offering official roadmaps, guides, projects, best practices, questions, and videos to assist developers in skill development and career growth. It provides role-based and skill-based roadmaps covering various technologies and domains. The platform is actively maintained and continuously updated to enhance the learning experience for developers worldwide.

site

: 3.4m

Wild Moose

Wild Moose is an AI-powered SRE Copilot tool designed to help companies handle incidents efficiently. It offers fast and efficient root cause analysis that improves with every incident by automatically gathering and analyzing logs, metrics, and code to pinpoint root causes. The tool converts tribal knowledge into custom playbooks, constantly improves performance with a system model that learns from each incident, and integrates seamlessly with various observability tools and deployment platforms. Wild Moose reduces cognitive load on teams, automates routine tasks, and provides actionable insights in real-time, enabling teams to act fast during outages.

site

: 0

Keep

Keep is an open-source AIOps platform designed for large enterprises, offering a comprehensive solution for managing alerts and events at scale. It provides features such as enrichment, workflows, a single pane of glass view, and over 90 integrations. Keep leverages AI technology to enhance IT operations by providing alert correlation based on past incidents and a continuous feedback loop. The platform integrates with various monitoring systems, incident response tools, ticketing systems, and more, offering advanced querying and data analysis capabilities. Keep is suitable for SREs, operators, engineers, startups, and global enterprises looking to efficiently manage alerts in complex environments.

site

: 31.9k

18 - Open Source Tools

knowledge

This repository serves as a personal knowledge base for the owner's reference and use. It covers a wide range of topics including cloud-native operations, Kubernetes ecosystem, networking, cloud services, telemetry, CI/CD, electronic engineering, hardware projects, operating systems, homelab setups, high-performance computing applications, openwrt router usage, programming languages, music theory, blockchain, distributed systems principles, and various other knowledge domains. The content is periodically refined and published on the owner's blog for maintenance purposes.

github

: 479

deepflow

DeepFlow is an open-source project that provides deep observability for complex cloud-native and AI applications. It offers Zero Code data collection with eBPF for metrics, distributed tracing, request logs, and function profiling. DeepFlow is integrated with SmartEncoding to achieve Full Stack correlation and efficient access to all observability data. With DeepFlow, cloud-native and AI applications automatically gain deep observability, removing the burden of developers continually instrumenting code and providing monitoring and diagnostic capabilities covering everything from code to infrastructure for DevOps/SRE teams.

github

: 3.2k

k8sgpt

K8sGPT is a tool for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English. It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.

github

: 6.4k

telemetry-airflow

This repository codifies the Airflow cluster that is deployed at workflow.telemetry.mozilla.org (behind SSO) and commonly referred to as "WTMO" or simply "Airflow". Some links relevant to users and developers of WTMO: * The `dags` directory in this repository contains some custom DAG definitions * Many of the DAGs registered with WTMO don't live in this repository, but are instead generated from ETL task definitions in bigquery-etl * The Data SRE team maintains a WTMO Developer Guide (behind SSO)

github

: 185

AiTreasureBox

AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

github

: 368

milvus

Milvus is an open-source vector database built to power embedding similarity search and AI applications. Milvus makes unstructured data search more accessible, and provides a consistent user experience regardless of the deployment environment. Milvus 2.0 is a cloud-native vector database with storage and computation separated by design. All components in this refactored version of Milvus are stateless to enhance elasticity and flexibility. For more architecture details, see Milvus Architecture Overview. Milvus was released under the open-source Apache License 2.0 in October 2019. It is currently a graduate project under LF AI & Data Foundation.

github

: 33.7k

dev-conf-replay

This repository contains information about various IT seminars and developer conferences in South Korea, allowing users to watch replays of past events. It covers a wide range of topics such as AI, big data, cloud, infrastructure, devops, blockchain, mobility, games, security, mobile development, frontend, programming languages, open source, education, and community events. Users can explore upcoming and past events, view related YouTube channels, and access additional resources like free programming ebooks and data structures and algorithms tutorials.

github

: 1.1k

AITreasureBox

AITreasureBox is a comprehensive collection of AI tools and resources designed to simplify and accelerate the development of AI projects. It provides a wide range of pre-trained models, datasets, and utilities that can be easily integrated into various AI applications. With AITreasureBox, developers can quickly prototype, test, and deploy AI solutions without having to build everything from scratch. Whether you are working on computer vision, natural language processing, or reinforcement learning projects, AITreasureBox has something to offer for everyone. The repository is regularly updated with new tools and resources to keep up with the latest advancements in the field of artificial intelligence.

github

: 559

ai-assisted-devops

github

: 204

awesome-mlops

github

: 11.7k

awesome-ChatGPT-repositories

github

: 2.4k

agent

Xata Agent is an open source tool designed to monitor PostgreSQL databases, identify issues, and provide recommendations for improvements. It acts as an AI expert, offering proactive suggestions for configuration tuning, troubleshooting performance issues, and common database problems. The tool is extensible, supports monitoring from cloud services like RDS & Aurora, and uses preset SQL commands to ensure database safety. Xata Agent can run troubleshooting statements, notify users of issues via Slack, and supports multiple AI models for enhanced functionality. It is actively used by the Xata team to manage Postgres databases efficiently.

github

: 668

kitops

KitOps is a packaging and versioning system for AI/ML projects that uses open standards so it works with the AI/ML, development, and DevOps tools you are already using. KitOps simplifies the handoffs between data scientists, application developers, and SREs working with LLMs and other AI/ML models. KitOps' ModelKits are a standards-based package for models, their dependencies, configurations, and codebases. ModelKits are portable, reproducible, and work with the tools you already use.

github

: 736

towhee

Towhee is a cutting-edge framework designed to streamline the processing of unstructured data through the use of Large Language Model (LLM) based pipeline orchestration. It can extract insights from diverse data types like text, images, audio, and video files using generative AI and deep learning models. Towhee offers rich operators, prebuilt ETL pipelines, and a high-performance backend for efficient data processing. With a Pythonic API, users can build custom data processing pipelines easily. Towhee is suitable for tasks like sentence embedding, image embedding, video deduplication, question answering with documents, and cross-modal retrieval based on CLIP.

github

: 3.2k

create-million-parameter-llm-from-scratch

The 'create-million-parameter-llm-from-scratch' repository provides a detailed guide on creating a Large Language Model (LLM) with 2.3 million parameters from scratch. The blog replicates the LLaMA approach, incorporating concepts like RMSNorm for pre-normalization, SwiGLU activation function, and Rotary Embeddings. The model is trained on a basic dataset to demonstrate the ease of creating a million-parameter LLM without the need for a high-end GPU.

github

: 65

oreilly-retrieval-augmented-gen-ai

This repository focuses on Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs). It provides code and resources to augment LLMs with real-time data for dynamic, context-aware applications. The content covers topics such as semantic search, fine-tuning embeddings, building RAG chatbots, evaluating LLMs, and using knowledge graphs in RAG. Prerequisites include Python skills, knowledge of machine learning and LLMs, and introductory experience with NLP and AI models.

github

: 61

kollektiv

Kollektiv is a Retrieval-Augmented Generation (RAG) system designed to enable users to chat with their favorite documentation easily. It aims to provide LLMs with access to the most up-to-date knowledge, reducing inaccuracies and improving productivity. The system utilizes intelligent web crawling, advanced document processing, vector search, multi-query expansion, smart re-ranking, AI-powered responses, and dynamic system prompts. The technical stack includes Python/FastAPI for backend, Supabase, ChromaDB, and Redis for storage, OpenAI and Anthropic Claude 3.5 Sonnet for AI/ML, and Chainlit for UI. Kollektiv is licensed under a modified version of the Apache License 2.0, allowing free use for non-commercial purposes.

github

: 74

HydraDragonAntivirus

Hydra Dragon Antivirus is a comprehensive tool that combines dynamic and static analysis using Sandboxie for Windows with ClamAV, YARA-X, machine learning AI, behavior analysis, NLP-based detection, website signatures, Ghidra, and Snort. The tool provides a Machine Learning Malware and Benign Database for training, along with a guide for compiling from source. It offers features like Ghidra source code analysis, Java Development Kit setup, and detailed logs for malware detections. Users can join the Discord community server for support and follow specific guidelines for preparing the analysis environment. The tool emphasizes security measures such as cleaning up directories, avoiding sharing IP addresses, and ensuring ClamAV database installation. It also includes tips for effective analysis and troubleshooting common issues.

github

: 58

3 - OpenAI Gpts

SREPro

Your SRE, DevOps and Observability buddy

gpt

: 400+

SLC Advisor

Critically analyze SLIs/SLOs with a deeper, provocative approach

gpt

: 20+

DevOps Guru

Advanced DevOps Guru with Linux distro and cloud-native tech expertise.

gpt

: 500+