
Awesome-LLMOps
๐ An awesome & curated list of best LLMOps tools.
Stars: 53

Awesome-LLMOps is a curated list of the best LLMOps tools, providing a comprehensive collection of frameworks and tools for building, deploying, and managing large language models (LLMs) and AI agents. The repository includes a wide range of tools for tasks such as building multimodal AI agents, fine-tuning models, orchestrating applications, evaluating models, and serving models for inference. It covers various aspects of the machine learning operations (MLOps) lifecycle, from training to deployment and observability. The tools listed in this repository cater to the needs of developers, data scientists, and machine learning engineers working with large language models and AI applications.
README:
๐ An awesome & curated list of best LLMOps tools.
- Agent
- Alignment
- Application Orchestration Framework
- Chat Framework
- Database
- Evaluation
- FineTune
- Gateway
- Inference
- MLOps
- Observation
- Output
- Training
-
Agno: Build Multimodal AI Agents with memory, knowledge and tools. Simple, fast and model-agnostic.
-
AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
-
LangGraph: Build resilient language agents as graphs.
-
MetaGPT: ๐ The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming.
-
OpenAI Agents SDK: A lightweight, powerful framework for multi-agent workflows.
-
OpenManus: No fortress, purely open ground. OpenManus is Coming.
-
PydanticAI: Agent Framework / shim to use Pydantic with LLMs.
-
Swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
-
Browser Use: Make websites accessible for AI agents.
-
Mem0: The Memory layer for AI Agents.
-
OpenAI CUA: Computer Using Agent Sample App.
-
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT).
-
Self-RLHF: Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback.
-
Dify: Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
-
Flowise: Drag & drop UI to build your customized LLM flow.
-
Haystack: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
-
Inference: Turn any computer or edge device into a command center for your computer vision projects.
-
LangChain: ๐ฆ๐ Build context-aware reasoning applications.
-
LlamaIndex: LlamaIndex is the leading framework for building LLM-powered agents over your data.
-
FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
-
Gradio: Build and share delightful machine learning apps, all in Python. ๐ Star to support our work!
-
Jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer.
-
Lobe Chat: ๐คฏ Lobe Chat - an open-source, modern-design AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / DeepSeek / Qwen), Knowledge Base (file upload / knowledge management / RAG ), Multi-Modals (Plugins/Artifacts) and Thinking. One-click FREE deployment of your private ChatGPT/ Claude / DeepSeek application.
-
NextChat: โจ Light and Fast AI Assistant. Support: Web | iOS | MacOS | Android | Linux | Windows.
-
Open WebUI: User-friendly AI Interface (Supports Ollama, OpenAI API, ...).
-
PrivateGPT: Interact with your documents using the power of GPT, 100% privately, no data leaks.
-
chroma: the AI-native open-source embedding database.
-
deeplake: Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow.
-
Faiss: A library for efficient similarity search and clustering of dense vectors.
-
milvus: Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search.
-
weaviate: Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native databaseโ.
-
AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24).
-
lm-evaluation-harness: A framework for few-shot evaluation of language models.
-
LongBench: LongBench v2 and LongBench (ACL 2024).
-
Axolotl: Go ahead and axolotl questions.
-
LLaMa-Factory: Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024).
-
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.
-
maestro: streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL.
-
Swift: Use PEFT or Full-parameter to finetune 450+ LLMs (Qwen2.5, InternLM3, GLM4, Llama3.3, Mistral, Yi1.5, Baichuan2, DeepSeek-R1, ...) and 150+ MLLMs (Qwen2.5-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).
-
torchtune: PyTorch native post-training library.
-
unsloth: Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! ๐ฆฅ
-
AI Gateway: A blazing fast AI Gateway with integrated guardrails. Route to 200+ LLMs, 50+ AI Guardrails with 1 fast & friendly API.
-
LiteLLM: Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq].
-
RouteLLM: A framework for serving and evaluating LLM routers - save LLM costs without compromising quality.
-
Envoy AI Gateway: Envoy AI Gateway is an open source project for using Envoy Gateway to handle request traffic from application clients to Generative AI services.
-
Higress: ๐ค AI Gateway | AI Native API Gateway.
-
kgateway: The Cloud-Native API Gateway and AI Gateway.
-
Kong: ๐ฆ The Cloud-Native API Gateway and AI Gateway.
-
Cortex.cpp: Local AI API Platform.
-
DeepSpeed-MII: MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
-
ipex-llm: Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
-
LMDeploy: LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
-
LoRAX: Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs.
-
llama.cpp: LLM inference in C/C++.
-
Llumnix: Efficient and easy multi-instance LLM serving.
-
MInference: [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
-
MLC LLM: Universal LLM Deployment Engine with ML Compilation.
-
MLServer: An inference server for your machine learning models, including support for multiple frameworks, multi-model serving and more.
-
Ollama: Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, and other large language models.
-
OpenVINO: OpenVINOโข is an open source toolkit for optimizing and deploying AI inference.
-
Ratchet: A cross-platform browser ML framework.
-
SGLang: SGLang is a fast serving framework for large language models and vision language models.
-
transformers.js: State-of-the-art Machine Learning for the web. Run ๐ค Transformers directly in your browser, with no need for a server!
-
Triton Inference Server: The Triton Inference Server provides an optimized cloud and edge inferencing solution.
-
Text Generation Inference: Large Language Model Text Generation Inference.
-
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs.
-
web-llm: High-performance In-browser LLM Inference Engine.
-
zml: Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuild.
-
AIBrix: Cost-efficient and pluggable Infrastructure components for GenAI inference.
-
Kserve: Standardized Serverless ML Inference Platform on Kubernetes.
-
KubeAI: AI Inference Operator for Kubernetes. The easiest way to serve ML models in production. Supports VLMs, LLMs, embeddings, and speech-to-text.
-
llmaz: โธ๏ธ Easy, advanced inference platform for large language models on Kubernetes. ๐ Star to support our work!
-
LMCache: 10x Faster Long-Context LLM By Smart KV Cache Optimizations.
-
Mooncake: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
-
OpenLLM: Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
-
BentoML: The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
-
Flyte: Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
-
Kubeflow: Machine Learning Toolkit for Kubernetes.
-
Metaflow: Build, Deploy and Manage AI/ML Systems.
-
MLflow: Open source platform for the machine learning lifecycle.
-
Polyaxon: MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle.
-
Ray: Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
-
Seldon-Core: An MLOps framework to package, deploy, monitor and manage thousands of production machine learning models.
-
ZenML: ZenML ๐: The bridge between ML and Ops. https://zenml.io.
-
OpenLLMetry: Open-source observability for your LLM application, based on OpenTelemetry.
-
Helicone: ๐ง Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 ๐
-
phoenix: AI Observability & Evaluation.
-
wandb: The AI developer platform. Use Weights & Biases to train and fine-tune models, and manage models from experimentation to production.
-
Instructor: structured outputs for llms.
-
Outlines: Structured Text Generation.
-
ColossalAI: Making large AI models cheaper, faster and more accessible.
-
Ludwig: Low-code framework for building custom LLMs, neural networks, and other AI models.
-
MaxText: A simple, performant and scalable Jax LLM!
-
MLX: MLX: An array framework for Apple silicon.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Awesome-LLMOps
Similar Open Source Tools

Awesome-LLMOps
Awesome-LLMOps is a curated list of the best LLMOps tools, providing a comprehensive collection of frameworks and tools for building, deploying, and managing large language models (LLMs) and AI agents. The repository includes a wide range of tools for tasks such as building multimodal AI agents, fine-tuning models, orchestrating applications, evaluating models, and serving models for inference. It covers various aspects of the machine learning operations (MLOps) lifecycle, from training to deployment and observability. The tools listed in this repository cater to the needs of developers, data scientists, and machine learning engineers working with large language models and AI applications.

Awesome-Segment-Anything
The Segment Anything Model (SAM) is a powerful tool that allows users to segment any object in an image with just a few clicks. This makes it a great tool for a variety of tasks, such as object detection, tracking, and editing. SAM is also very easy to use, making it a great option for both beginners and experienced users.

awesome-langchain-zh
The awesome-langchain-zh repository is a collection of resources related to LangChain, a framework for building AI applications using large language models (LLMs). The repository includes sections on the LangChain framework itself, other language ports of LangChain, tools for low-code development, services, agents, templates, platforms, open-source projects related to knowledge management and chatbots, as well as learning resources such as notebooks, videos, and articles. It also covers other LLM frameworks and provides additional resources for exploring and working with LLMs. The repository serves as a comprehensive guide for developers and AI enthusiasts interested in leveraging LangChain and LLMs for various applications.

koko-aio-slang
Koko-aio shader is an all-in-one CRT shader tool that can be configured with various parameters to run on different GPUs. It aims to provide visual parameters to make monitors look similar to CRT displays without simulating their internal behavior. The tool includes features such as color corrections, B/W display colorization, antialiasing, noise effects, deconvergence, blurring/sharpening, interlacing, phosphor glow, and more. It also supports ambient lighting, vignette, integer scaling, and various image effects. Koko-aio is designed to enhance the visual experience of low-res content on high-resolution displays.

L3AGI
L3AGI is an open-source tool that enables AI Assistants to collaborate together as effectively as human teams. It provides a robust set of functionalities that empower users to design, supervise, and execute both autonomous AI Assistants and Teams of Assistants. Key features include the ability to create and manage Teams of AI Assistants, design and oversee standalone AI Assistants, equip AI Assistants with the ability to retain and recall information, connect AI Assistants to an array of data sources for efficient information retrieval and processing, and employ curated sets of tools for specific tasks. L3AGI also offers a user-friendly interface, APIs for integration with other systems, and a vibrant community for support and collaboration.

X-AnyLabeling
X-AnyLabeling is a robust annotation tool that seamlessly incorporates an AI inference engine alongside an array of sophisticated features. Tailored for practical applications, it is committed to delivering comprehensive, industrial-grade solutions for image data engineers. This tool excels in swiftly and automatically executing annotations across diverse and intricate tasks.

neural-compressor
Intelยฎ Neural Compressor is an open-source Python library that supports popular model compression techniques such as quantization, pruning (sparsity), distillation, and neural architecture search on mainstream frameworks such as TensorFlow, PyTorch, ONNX Runtime, and MXNet. It provides key features, typical examples, and open collaborations, including support for a wide range of Intel hardware, validation of popular LLMs, and collaboration with cloud marketplaces, software platforms, and open AI ecosystems.

david-ai
David UI is a free and open-source collection of customizable, production-ready UI components built with Tailwind CSS. It is designed to be developer-friendly and performance-focused, streamlining the creation of modern, visually appealing interfaces to help deliver high-quality user experiences faster.

mnn-llm
MNN-LLM is a high-performance inference engine for large language models (LLMs) on mobile and embedded devices. It provides optimized implementations of popular LLM models, such as ChatGPT, BLOOM, and GPT-3, enabling developers to easily integrate these models into their applications. MNN-LLM is designed to be efficient and lightweight, making it suitable for resource-constrained devices. It supports various deployment options, including mobile apps, web applications, and embedded systems. With MNN-LLM, developers can leverage the power of LLMs to enhance their applications with natural language processing capabilities, such as text generation, question answering, and dialogue generation.

Comfyui-Aix-NodeMap
Comfyui-Aix-NodeMap is a project by the Aix team to organize and annotate the latest nodes in Comfyui. It aims to address the challenge of finding nodes effectively as their number increases. The project is continuously updated every 7 days, with the opportunity for users to provide feedback on any omissions or errors. The team respects developers' opinions and strives to make corrections promptly. The project is part of Aix's vision to make humanity more efficient through open-source contributions, including daily updates on workflow, AI information, and node introductions.

eulers-shield
Euler's Shield is a decentralized, AI-powered financial system designed to stabilize the value of Pi Coin at $314.159. It combines blockchain, machine learning, and cybersecurity to ensure the security, scalability, and decentralization of the Pi Coin ecosystem.

awesome-langchain
LangChain is an amazing framework to get LLM projects done in a matter of no time, and the ecosystem is growing fast. Here is an attempt to keep track of the initiatives around LangChain. Subscribe to the newsletter to stay informed about the Awesome LangChain. We send a couple of emails per month about the articles, videos, projects, and tools that grabbed our attention Contributions welcome. Add links through pull requests or create an issue to start a discussion. Please read the contribution guidelines before contributing.

pi-nexus-autonomous-banking-network
A decentralized, AI-driven system accelerating the Open Mainet Pi Network, connecting global banks for secure, efficient, and autonomous transactions. The Pi-Nexus Autonomous Banking Network is built using Raspberry Pi devices and allows for the creation of a decentralized, autonomous banking system.

sfdx-hardis
sfdx-hardis is a toolbox for Salesforce DX, developed by Cloudity, that simplifies tasks which would otherwise take minutes or hours to complete manually. It enables users to define complete CI/CD pipelines for Salesforce projects, backup metadata, and monitor any Salesforce org. The tool offers a wide range of commands that can be accessed via the command line interface or through a Visual Studio Code extension. Additionally, sfdx-hardis provides Docker images for easy integration into CI workflows. The tool is designed to be natively compliant with various platforms and tools, making it a versatile solution for Salesforce developers.

SuperAGI
SuperAGI is an open-source framework designed to build, manage, and run autonomous AI agents. It enables developers to create production-ready and scalable agents, extend agent capabilities with toolkits, and interact with agents through a graphical user interface. The framework allows users to connect to multiple Vector DBs, optimize token usage, store agent memory, utilize custom fine-tuned models, and automate tasks with predefined steps. SuperAGI also provides a marketplace for toolkits that enable agents to interact with external systems and third-party plugins.
For similar tasks

Awesome-LLMOps
Awesome-LLMOps is a curated list of the best LLMOps tools, providing a comprehensive collection of frameworks and tools for building, deploying, and managing large language models (LLMs) and AI agents. The repository includes a wide range of tools for tasks such as building multimodal AI agents, fine-tuning models, orchestrating applications, evaluating models, and serving models for inference. It covers various aspects of the machine learning operations (MLOps) lifecycle, from training to deployment and observability. The tools listed in this repository cater to the needs of developers, data scientists, and machine learning engineers working with large language models and AI applications.

labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.

promptfoo
Promptfoo is a tool for testing and evaluating LLM output quality. With promptfoo, you can build reliable prompts, models, and RAGs with benchmarks specific to your use-case, speed up evaluations with caching, concurrency, and live reloading, score outputs automatically by defining metrics, use as a CLI, library, or in CI/CD, and use OpenAI, Anthropic, Azure, Google, HuggingFace, open-source models like Llama, or integrate custom API providers for any LLM API.

vespa
Vespa is a platform that performs operations such as selecting a subset of data in a large corpus, evaluating machine-learned models over the selected data, organizing and aggregating it, and returning it, typically in less than 100 milliseconds, all while the data corpus is continuously changing. It has been in development for many years and is used on a number of large internet services and apps which serve hundreds of thousands of queries from Vespa per second.

python-aiplatform
The Vertex AI SDK for Python is a library that provides a convenient way to use the Vertex AI API. It offers a high-level interface for creating and managing Vertex AI resources, such as datasets, models, and endpoints. The SDK also provides support for training and deploying custom models, as well as using AutoML models. With the Vertex AI SDK for Python, you can quickly and easily build and deploy machine learning models on Vertex AI.

ScandEval
ScandEval is a framework for evaluating pretrained language models on mono- or multilingual language tasks. It provides a unified interface for benchmarking models on a variety of tasks, including sentiment analysis, question answering, and machine translation. ScandEval is designed to be easy to use and extensible, making it a valuable tool for researchers and practitioners alike.

opencompass
OpenCompass is a one-stop platform for large model evaluation, aiming to provide a fair, open, and reproducible benchmark for large model evaluation. Its main features include: * Comprehensive support for models and datasets: Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 70+ datasets with about 400,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. * Efficient distributed evaluation: One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. * Diversified evaluation paradigms: Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue-type prompt templates, to easily stimulate the maximum performance of various models. * Modular design with high extensibility: Want to add new models or datasets, customize an advanced task division strategy, or even support a new cluster management system? Everything about OpenCompass can be easily expanded! * Experiment management and reporting mechanism: Use config files to fully record each experiment, and support real-time reporting of results.

flower
Flower is a framework for building federated learning systems. It is designed to be customizable, extensible, framework-agnostic, and understandable. Flower can be used with any machine learning framework, for example, PyTorch, TensorFlow, Hugging Face Transformers, PyTorch Lightning, scikit-learn, JAX, TFLite, MONAI, fastai, MLX, XGBoost, Pandas for federated analytics, or even raw NumPy for users who enjoy computing gradients by hand.
For similar jobs

runbooks
Runbooks is a repository that is no longer active. The project has been deprecated in favor of KubeAI, a platform designed to simplify the operationalization of AI on Kubernetes. For more information, please refer to the new repository at https://github.com/substratusai/kubeai.

aiops-modules
AIOps Modules is a collection of reusable Infrastructure as Code (IAC) modules that work with SeedFarmer CLI. The modules are decoupled and can be aggregated using GitOps principles to achieve desired use cases, removing heavy lifting for end users. They must be generic for reuse in Machine Learning and Foundation Model Operations domain, adhering to SeedFarmer Guide structure. The repository includes deployment steps, project manifests, and various modules for SageMaker, Mlflow, FMOps/LLMOps, MWAA, Step Functions, EKS, and example use cases. It also supports Industry Data Framework (IDF) and Autonomous Driving Data Framework (ADDF) Modules.

Awesome-LLMOps
Awesome-LLMOps is a curated list of the best LLMOps tools, providing a comprehensive collection of frameworks and tools for building, deploying, and managing large language models (LLMs) and AI agents. The repository includes a wide range of tools for tasks such as building multimodal AI agents, fine-tuning models, orchestrating applications, evaluating models, and serving models for inference. It covers various aspects of the machine learning operations (MLOps) lifecycle, from training to deployment and observability. The tools listed in this repository cater to the needs of developers, data scientists, and machine learning engineers working with large language models and AI applications.

weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.