k8sgpt

Giving Kubernetes Superpowers to everyone

Stars: 6435

Visit

K8sGPT is a tool for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English. It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.

README:

k8sgpt is a tool for scanning your Kubernetes clusters, diagnosing, and triaging issues in simple English.

It has SRE experience codified into its analyzers and helps to pull out the most relevant information to enrich it with AI.

Out of the box integration with OpenAI, Azure, Cohere, Amazon Bedrock, Google Gemini and local models.

Overview
Installation
Quick Start
Analyzers
Examples
LLM AI Backends
Key Features
Documentation
Contributing
Community
License

CLI Installation

Linux/Mac via brew

brew install k8sgpt

brew tap k8sgpt-ai/k8sgpt
brew install k8sgpt

RPM-based installation (RedHat/CentOS/Fedora)

32 bit:

sudo rpm -ivh https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.4.3/k8sgpt_386.rpm

64 bit:

sudo rpm -ivh https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.4.3/k8sgpt_amd64.rpm

DEB-based installation (Ubuntu/Debian)

32 bit:

curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.4.3/k8sgpt_386.deb
sudo dpkg -i k8sgpt_386.deb

64 bit:

curl -LO https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.4.3/k8sgpt_amd64.deb
sudo dpkg -i k8sgpt_amd64.deb

APK-based installation (Alpine)

32 bit:

wget https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.4.3/k8sgpt_386.apk
apk add --allow-untrusted k8sgpt_386.apk

64 bit:

wget https://github.com/k8sgpt-ai/k8sgpt/releases/download/v0.4.3/k8sgpt_amd64.apk
apk add --allow-untrusted k8sgpt_amd64.apk

Failing Installation on WSL or Linux (missing gcc)

When installing Homebrew on WSL or Linux, you may encounter the following error:

==> Installing k8sgpt from k8sgpt-ai/k8sgpt Error: The following formula cannot be installed from a bottle and must be
built from the source. k8sgpt Install Clang or run brew install gcc.

If you install gcc as suggested, the problem will persist. Therefore, you need to install the build-essential package.

   sudo apt-get update
   sudo apt-get install build-essential

Windows

Download the latest Windows binaries of k8sgpt from the Release tab based on your system architecture.
Extract the downloaded package to your desired location. Configure the system PATH environment variable with the binary location

Operator Installation

To install within a Kubernetes cluster please use our k8sgpt-operator with installation instructions available here

This mode of operation is ideal for continuous monitoring of your cluster and can integrate with your existing monitoring such as Prometheus and Alertmanager.

Quick Start

Currently, the default AI provider is OpenAI, you will need to generate an API key from OpenAI
- You can do this by running k8sgpt generate to open a browser link to generate it
Run k8sgpt auth add to set it in k8sgpt.
- You can provide the password directly using the --password flag.
Run k8sgpt filters to manage the active filters used by the analyzer. By default, all filters are executed during analysis.
Run k8sgpt analyze to run a scan.
And use k8sgpt analyze --explain to get a more detailed explanation of the issues.
You also run k8sgpt analyze --with-doc (with or without the explain flag) to get the official documentation from Kubernetes.

Analyzers

K8sGPT uses analyzers to triage and diagnose issues in your cluster. It has a set of analyzers that are built in, but you will be able to write your own analyzers.

Built in analyzers

Enabled by default

[x] podAnalyzer
[x] pvcAnalyzer
[x] rsAnalyzer
[x] serviceAnalyzer
[x] eventAnalyzer
[x] ingressAnalyzer
[x] statefulSetAnalyzer
[x] deploymentAnalyzer
[x] cronJobAnalyzer
[x] nodeAnalyzer
[x] mutatingWebhookAnalyzer
[x] validatingWebhookAnalyzer

Optional

[x] hpaAnalyzer
[x] pdbAnalyzer
[x] networkPolicyAnalyzer
[x] gatewayClass
[x] gateway
[x] httproute
[x] logAnalyzer

Examples

Run a scan with the default analyzers

k8sgpt generate
k8sgpt auth add
k8sgpt analyze --explain
k8sgpt analyze --explain --with-doc

Filter on resource

k8sgpt analyze --explain --filter=Service

Filter by namespace

k8sgpt analyze --explain --filter=Pod --namespace=default

Output to JSON

k8sgpt analyze --explain --filter=Service --output=json

Anonymize during explain

k8sgpt analyze --explain --filter=Service --output=json --anonymize

Using filters

List filters

k8sgpt filters list

Add default filters

k8sgpt filters add [filter(s)]

Examples :

Simple filter : k8sgpt filters add Service
Multiple filters : k8sgpt filters add Ingress,Pod

Remove default filters

k8sgpt filters remove [filter(s)]

Examples :

Simple filter : k8sgpt filters remove Service
Multiple filters : k8sgpt filters remove Ingress,Pod

Additional commands

List configured backends

k8sgpt auth list

Update configured backends

k8sgpt auth update $MY_BACKEND1,$MY_BACKEND2..

Remove configured backends

k8sgpt auth remove -b $MY_BACKEND1,$MY_BACKEND2..

List integrations

k8sgpt integrations list

Activate integrations

k8sgpt integrations activate [integration(s)]

Use integration

k8sgpt analyze --filter=[integration(s)]

Deactivate integrations

k8sgpt integrations deactivate [integration(s)]

Serve mode

k8sgpt serve

Analysis with serve mode

grpcurl -plaintext -d '{"namespace": "k8sgpt", "explain" : "true"}' localhost:8080 schema.v1.ServerAnalyzerService/Analyze
{
  "status": "OK"
}

Analysis with custom headers

k8sgpt analyze --explain --custom-headers CustomHeaderKey:CustomHeaderValue

Print analysis stats

k8sgpt analyze -s
The stats mode allows for debugging and understanding the time taken by an analysis by displaying the statistics of each analyzer.
- Analyzer Ingress took 47.125583ms
- Analyzer PersistentVolumeClaim took 53.009167ms
- Analyzer CronJob took 57.517792ms
- Analyzer Deployment took 156.6205ms
- Analyzer Node took 160.109833ms
- Analyzer ReplicaSet took 245.938333ms
- Analyzer StatefulSet took 448.0455ms
- Analyzer Pod took 5.662594708s
- Analyzer Service took 38.583359166s

Diagnostic information

To collect diagnostic information use the following command to create a dump_<timestamp>_json in your local directory.

k8sgpt dump

LLM AI Backends

K8sGPT uses the chosen LLM, generative AI provider when you want to explain the analysis results using --explain flag e.g. k8sgpt analyze --explain. You can use --backend flag to specify a configured provider (it's openai by default).

You can list available providers using k8sgpt auth list:

Default:
> openai
Active:
Unused:
> openai
> localai
> ollama
> azureopenai
> cohere
> amazonbedrock
> amazonsagemaker
> google
> huggingface
> noopai
> googlevertexai
> watsonxai
> customrest
> ibmwatsonxai

For detailed documentation on how to configure and use each provider see here.

To set a new default provider

k8sgpt auth default -p azureopenai
Default provider set to azureopenai

Key Features

With this option, the data is anonymized before being sent to the AI Backend. During the analysis execution, k8sgpt retrieves sensitive data (Kubernetes object names, labels, etc.). This data is masked when sent to the AI backend and replaced by a key that can be used to de-anonymize the data when the solution is returned to the user.

Anonymization

Error reported during analysis:

Error: HorizontalPodAutoscaler uses StatefulSet/fake-deployment as ScaleTargetRef which does not exist.

Payload sent to the AI backend:

Error: HorizontalPodAutoscaler uses StatefulSet/tGLcCRcHa1Ce5Rs as ScaleTargetRef which does not exist.

Payload returned by the AI:

The Kubernetes system is trying to scale a StatefulSet named tGLcCRcHa1Ce5Rs using the HorizontalPodAutoscaler, but it cannot find the StatefulSet. The solution is to verify that the StatefulSet name is spelled correctly and exists in the same namespace as the HorizontalPodAutoscaler.

Payload returned to the user:

The Kubernetes system is trying to scale a StatefulSet named fake-deployment using the HorizontalPodAutoscaler, but it cannot find the StatefulSet. The solution is to verify that the StatefulSet name is spelled correctly and exists in the same namespace as the HorizontalPodAutoscaler.

Further Details

Note: Anonymization does not currently apply to events.

In a few analysers like Pod, we feed to the AI backend the event messages which are not known beforehand thus we are not masking them for the time being.

The following is the list of analysers in which data is being masked:-
- Statefulset
- Service
- PodDisruptionBudget
- Node
- NetworkPolicy
- Ingress
- HPA
- Deployment
- Cronjob
The following is the list of analysers in which data is not being masked:-
- ReplicaSet
- PersistentVolumeClaim
- Pod
- Log
- *Events

*Note:

k8gpt will not mask the above analysers because they do not send any identifying information except Events analyser.
Masking for Events analyzer is scheduled in the near future as seen in this issue. Further research has to be made to understand the patterns and be able to mask the sensitive parts of an event like pod name, namespace etc.
The following is the list of fields which are not being masked:-
- Describe
- ObjectStatus
- Replicas
- ContainerStatus
- *Event Message
- ReplicaStatus
- Count (Pod)

*Note:

It is quite possible the payload of the event message might have something like "super-secret-project-pod-X crashed" which we don't currently redact (scheduled in the near future as seen in this issue).

Proceed with care

The K8gpt team recommends using an entirely different backend (a local model) in critical production environments. By using a local model, you can rest assured that everything stays within your DMZ, and nothing is leaked.
If there is any uncertainty about the possibility of sending data to a public LLM (open AI, Azure AI) and it poses a risk to business-critical operations, then, in such cases, the use of public LLM should be avoided based on personal assessment and the jurisdiction of risks involved.

Configuration management

k8sgpt stores config data in the $XDG_CONFIG_HOME/k8sgpt/k8sgpt.yaml file. The data is stored in plain text, including your OpenAI key.

Config file locations:

OS	Path
MacOS	~/Library/Application Support/k8sgpt/k8sgpt.yaml
Linux	~/.config/k8sgpt/k8sgpt.yaml
Windows	%LOCALAPPDATA%/k8sgpt/k8sgpt.yaml

There may be scenarios where caching remotely is preferred. In these scenarios K8sGPT supports AWS S3 or Azure Blob storage Integration.

Remote caching

Note: You can configure and use only one remote cache at a time

Adding a remote cache

AWS S3
- As a prerequisite AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are required as environmental variables.
- Configuration, k8sgpt cache add s3 --region <aws region> --bucket <name>
- Minio Configuration with HTTP endpoint k8sgpt cache add s3 --bucket <name> --endpoint <http://localhost:9000>
- Minio Configuration with HTTPs endpoint, skipping TLS verification k8sgpt cache add s3 --bucket <name> --endpoint <https://localhost:9000> --insecure
  - K8sGPT will create the bucket if it does not exist
Azure Storage
- We support a number of techniques to authenticate against Azure
- Configuration, k8sgpt cache add azure --storageacc <storage account name> --container <container name>
  - K8sGPT assumes that the storage account already exist and it will create the container if it does not exist
  - It is the user responsibility have to grant specific permissions to their identity in order to be able to upload blob files and create SA containers (e.g Storage Blob Data Contributor)
Google Cloud Storage
- As a prerequisite GOOGLE_APPLICATION_CREDENTIALS are required as environmental variables.
- Configuration, k8sgpt cache add gcs --region <gcp region> --bucket <name> --projectid <project id>
  - K8sGPT will create the bucket if it does not exist

Listing cache items

k8sgpt cache list

Purging an object from the cache Note: purging an object using this command will delete upstream files, so it requires appropriate permissions.

k8sgpt cache purge $OBJECT_NAME

Removing the remote cache Note: this will not delete the upstream S3 bucket or Azure storage container

k8sgpt cache remove

Custom Analyzers

There may be scenarios where you wish to write your own analyzer in a language of your choice. K8sGPT now supports the ability to do so by abiding by the schema and serving the analyzer for consumption. To do so, define the analyzer within the K8sGPT configuration and it will add it into the scanning process. In addition to this you will need to enable the following flag on analysis:

k8sgpt analyze --custom-analysis

Here is an example local host analyzer in Rust When this is run on localhost:8080 the K8sGPT config can pick it up with the following additions:

custom_analyzers:
  - name: host-analyzer
    connection:
      url: localhost
      port: 8080

This now gives the ability to pass through hostOS information ( from this analyzer example ) to K8sGPT to use as context with normal analysis.

See the docs on how to write a custom analyzer

Listing custom analyzers configured

k8sgpt custom-analyzer list

Adding custom analyzer without install

k8sgpt custom-analyzer add --name my-custom-analyzer --port 8085

Removing custom analyzer

k8sgpt custom-analyzer remove --names "my-custom-analyzer,my-custom-analyzer-2"

Documentation

Find our official documentation available here

Contributing

Please read our contributing guide.

Community

Find us on Slack

License

For Tasks:

Click tags to check more tools for each tasks

scan kubernetes clusters diagnose kubernetes issues triage kubernetes issues

For Jobs:

kubernetes administrator site reliability engineer devops engineer cloud architect software engineer

Alternative AI tools for k8sgpt

Similar Open Source Tools

k8sgpt

github

: 6.4k

ChatDBG

ChatDBG is an AI-based debugging assistant for C/C++/Python/Rust code that integrates large language models into a standard debugger (`pdb`, `lldb`, `gdb`, and `windbg`) to help debug your code. With ChatDBG, you can engage in a dialog with your debugger, asking open-ended questions about your program, like `why is x null?`. ChatDBG will _take the wheel_ and steer the debugger to answer your queries. ChatDBG can provide error diagnoses and suggest fixes. As far as we are aware, ChatDBG is the _first_ debugger to automatically perform root cause analysis and to provide suggested fixes.

github

: 825

ChatSim

ChatSim is a tool designed for editable scene simulation for autonomous driving via LLM-Agent collaboration. It provides functionalities for setting up the environment, installing necessary dependencies like McNeRF and Inpainting tools, and preparing data for simulation. Users can train models, simulate scenes, and track trajectories for smoother and more realistic results. The tool integrates with Blender software and offers options for training McNeRF models and McLight's skydome estimation network. It also includes a trajectory tracking module for improved trajectory tracking. ChatSim aims to facilitate the simulation of autonomous driving scenarios with collaborative LLM-Agents.

github

: 284

gcop

GCOP (Git Copilot) is an AI-powered Git assistant that automates commit message generation, enhances Git workflow, and offers 20+ smart commands. It provides intelligent commit crafting, customizable commit templates, smart learning capabilities, and a seamless developer experience. Users can generate AI commit messages, add all changes with AI-generated messages, undo commits while keeping changes staged, and push changes to the current branch. GCOP offers configuration options for AI models and provides detailed documentation, contribution guidelines, and a changelog. The tool is designed to make version control easier and more efficient for developers.

github

: 164

py-gpt

github

: 785

nano-graphrag

nano-GraphRAG is a simple, easy-to-hack implementation of GraphRAG that provides a smaller, faster, and cleaner version of the official implementation. It is about 800 lines of code, small yet scalable, asynchronous, and fully typed. The tool supports incremental insert, async methods, and various parameters for customization. Users can replace storage components and LLM functions as needed. It also allows for embedding function replacement and comes with pre-defined prompts for entity extraction and community reports. However, some features like covariates and global search implementation differ from the original GraphRAG. Future versions aim to address issues related to data source ID, community description truncation, and add new components.

github

: 2.6k

olah

Olah is a self-hosted lightweight Huggingface mirror service that implements mirroring feature for Huggingface resources at file block level, enhancing download speeds and saving bandwidth. It offers cache control policies and allows administrators to configure accessible repositories. Users can install Olah with pip or from source, set up the mirror site, and download models and datasets using huggingface-cli. Olah provides additional configurations through a configuration file for basic setup and accessibility restrictions. Future work includes implementing an administrator and user system, OOS backend support, and mirror update schedule task. Olah is released under the MIT License.

github

: 132

stark

STaRK is a large-scale semi-structure retrieval benchmark on Textual and Relational Knowledge Bases. It provides natural-sounding and practical queries crafted to incorporate rich relational information and complex textual properties, closely mirroring real-life scenarios. The benchmark aims to assess how effectively large language models can handle the interplay between textual and relational requirements in queries, using three diverse knowledge bases constructed from public sources.

github

: 317

mods

AI for the command line, built for pipelines. LLM based AI is really good at interpreting the output of commands and returning the results in CLI friendly text formats like Markdown. Mods is a simple tool that makes it super easy to use AI on the command line and in your pipelines. Mods works with OpenAI, Groq, Azure OpenAI, and LocalAI To get started, install Mods and check out some of the examples below. Since Mods has built-in Markdown formatting, you may also want to grab Glow to give the output some _pizzazz_.

github

: 3.4k

aio-theme

github

: 71

hash

HASH is a self-building, open-source database which grows, structures and checks itself. With it, we're creating a platform for decision-making, which helps you integrate, understand and use data in a variety of different ways.

github

: 1.2k

python-tgpt

Python-tgpt is a Python package that enables seamless interaction with over 45 free LLM providers without requiring an API key. It also provides image generation capabilities. The name _python-tgpt_ draws inspiration from its parent project tgpt, which operates on Golang. Through this Python adaptation, users can effortlessly engage with a number of free LLMs available, fostering a smoother AI interaction experience.

github

: 95

agenticSeek

AgenticSeek is a voice-enabled AI assistant powered by DeepSeek R1 agents, offering a fully local alternative to cloud-based AI services. It allows users to interact with their filesystem, code in multiple languages, and perform various tasks autonomously. The tool is equipped with memory to remember user preferences and past conversations, and it can divide tasks among multiple agents for efficient execution. AgenticSeek prioritizes privacy by running entirely on the user's hardware without sending data to the cloud.

github

: 743

tenere

Tenere is a TUI interface for Language Model Libraries (LLMs) written in Rust. It provides syntax highlighting, chat history, saving chats to files, Vim keybindings, copying text from/to clipboard, and supports multiple backends. Users can configure Tenere using a TOML configuration file, set key bindings, and use different LLMs such as ChatGPT, llama.cpp, and ollama. Tenere offers default key bindings for global and prompt modes, with features like starting a new chat, saving chats, scrolling, showing chat history, and quitting the app. Users can interact with the prompt in different modes like Normal, Visual, and Insert, with various key bindings for navigation, editing, and text manipulation.

github

: 419

please-cli

Please CLI is an AI helper script designed to create CLI commands by leveraging the GPT model. Users can input a command description, and the script will generate a Linux command based on that input. The tool offers various functionalities such as invoking commands, copying commands to the clipboard, asking questions about commands, and more. It supports parameters for explanation, using different AI models, displaying additional output, storing API keys, querying ChatGPT with specific models, showing the current version, and providing help messages. Users can install Please CLI via Homebrew, apt, Nix, dpkg, AUR, or manually from source. The tool requires an OpenAI API key for operation and offers configuration options for setting API keys and OpenAI settings. Please CLI is licensed under the Apache License 2.0 by TNG Technology Consulting GmbH.

github

: 73

bilingual_book_maker

The bilingual_book_maker is an AI translation tool that uses ChatGPT to assist users in creating multi-language versions of epub/txt/srt files and books. It supports various models like gpt-4, gpt-3.5-turbo, claude-2, palm, llama-2, azure-openai, command-nightly, and gemini. Users need ChatGPT or OpenAI token, epub/txt books, internet access, and Python 3.8+. The tool provides options to specify OpenAI API key, model selection, target language, proxy server, context addition, translation style, and more. It generates bilingual books in epub format after translation. Users can test translations, set batch size, tweak prompts, and use different models like DeepL, Google Gemini, Tencent TranSmart, and more. The tool also supports retranslation, translating specific tags, and e-reader type specification. Docker usage is available for easy setup.

github

: 7.8k

For similar tasks

k8sgpt

github

: 6.4k

For similar jobs

minio

MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.

github

: 46.0k

ai-on-gke

This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources

github

: 280

kong

Kong, or Kong API Gateway, is a cloud-native, platform-agnostic, scalable API Gateway distinguished for its high performance and extensibility via plugins. It also provides advanced AI capabilities with multi-LLM support. By providing functionality for proxying, routing, load balancing, health checking, authentication (and more), Kong serves as the central layer for orchestrating microservices or conventional API traffic with ease. Kong runs natively on Kubernetes thanks to its official Kubernetes Ingress Controller.

github

: 40.4k

AI-in-a-Box

AI-in-a-Box is a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction, while maintaining the highest standards of quality and efficiency. It provides essential guidance on the responsible use of AI and LLM technologies, specific security guidance for Generative AI (GenAI) applications, and best practices for scaling OpenAI applications within Azure. The available accelerators include: Azure ML Operationalization in-a-box, Edge AI in-a-box, Doc Intelligence in-a-box, Image and Video Analysis in-a-box, Cognitive Services Landing Zone in-a-box, Semantic Kernel Bot in-a-box, NLP to SQL in-a-box, Assistants API in-a-box, and Assistants API Bot in-a-box.

github

: 527

awsome-distributed-training

This repository contains reference architectures and test cases for distributed model training with Amazon SageMaker Hyperpod, AWS ParallelCluster, AWS Batch, and Amazon EKS. The test cases cover different types and sizes of models as well as different frameworks and parallel optimizations (Pytorch DDP/FSDP, MegatronLM, NemoMegatron...).

github

: 230

generative-ai-cdk-constructs

The AWS Generative AI Constructs Library is an open-source extension of the AWS Cloud Development Kit (AWS CDK) that provides multi-service, well-architected patterns for quickly defining solutions in code to create predictable and repeatable infrastructure, called constructs. The goal of AWS Generative AI CDK Constructs is to help developers build generative AI solutions using pattern-based definitions for their architecture. The patterns defined in AWS Generative AI CDK Constructs are high level, multi-service abstractions of AWS CDK constructs that have default configurations based on well-architected best practices. The library is organized into logical modules using object-oriented techniques to create each architectural pattern model.

github

: 444

model_server

OpenVINO™ Model Server (OVMS) is a high-performance system for serving models. Implemented in C++ for scalability and optimized for deployment on Intel architectures, the model server uses the same architecture and API as TensorFlow Serving and KServe while applying OpenVINO for inference execution. Inference service is provided via gRPC or REST API, making deploying new algorithms and AI experiments easy.

github

: 718

dify-helm

Deploy langgenius/dify, an LLM based chat bot app on kubernetes with helm chart.

github

: 340

k8sgpt

README:

Table of Contents

CLI Installation

Linux/Mac via brew

Windows

Operator Installation

Quick Start

Analyzers

Built in analyzers

Enabled by default

Optional

Examples

Examples :

Examples :

LLM AI Backends

Key Features

Further Details

Proceed with care

Documentation

Contributing

Community

License

For Tasks:

For Jobs:

Alternative AI tools for k8sgpt

Similar Open Source Tools

k8sgpt

ChatDBG

ChatSim

gcop

py-gpt

nano-graphrag

olah

stark

mods

aio-theme

hash

python-tgpt

agenticSeek

tenere

please-cli

bilingual_book_maker

For similar tasks

k8sgpt

For similar jobs

minio

ai-on-gke

kong

AI-in-a-Box

awsome-distributed-training

generative-ai-cdk-constructs

model_server

dify-helm