
opik
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
Stars: 13779

Comet Opik is a repository containing two main services: a frontend and a backend. It provides a Python SDK for easy installation. Users can run the full application locally with minikube, following specific installation prerequisites. The repository structure includes directories for applications like Opik backend, with detailed instructions available in the README files. Users can manage the installation using simple k8s commands and interact with the application via URLs for checking the running application and API documentation. The repository aims to facilitate local development and testing of Opik using Kubernetes technology.
README:
Opik helps you build, evaluate, and optimize LLM systems that run better, faster, and cheaper. From RAG chatbots to code assistants to complex agentic pipelines, Opik provides comprehensive tracing, evaluations, dashboards, and powerful features like Opik Agent Optimizer and Opik Guardrails to improve and secure your LLM powered applications in production.
Website • Slack Community • Twitter • Changelog • Documentation
🧑⚖️ LLM as a Judge • 🔍 Evaluating your Application • ⭐ Star Us • 🤝 Contributing
Opik (built by Comet) is an open-source platform designed to streamline the entire lifecycle of LLM applications. It empowers developers to evaluate, test, monitor, and optimize their models and agentic systems. Key offerings include:
- Comprehensive Observability: Deep tracing of LLM calls, conversation logging, and agent activity.
- Advanced Evaluation: Robust prompt evaluation, LLM-as-a-judge, and experiment management.
- Production-Ready: Scalable monitoring dashboards and online evaluation rules for production.
- Opik Agent Optimizer: Dedicated SDK and set of optimizers to enhance prompts and agents.
- Opik Guardrails: Features to help you implement safe and responsible AI practices.
Key capabilities include:
-
Development & Tracing:
- Track all LLM calls and traces with detailed context during development and in production (Quickstart).
- Extensive 3rd-party integrations for easy observability: Seamlessly integrate with a growing list of frameworks, supporting many of the largest and most popular ones natively (including recent additions like Google ADK, Autogen, and Flowise AI). (Integrations)
- Annotate traces and spans with feedback scores via the Python SDK or the UI.
- Experiment with prompts and models in the Prompt Playground.
-
Evaluation & Testing:
- Automate your LLM application evaluation with Datasets and Experiments.
- Leverage powerful LLM-as-a-judge metrics for complex tasks like hallucination detection, moderation, and RAG assessment (Answer Relevance, Context Precision).
- Integrate evaluations into your CI/CD pipeline with our PyTest integration.
-
Production Monitoring & Optimization:
- Log high volumes of production traces: Opik is designed for scale (40M+ traces/day).
- Monitor feedback scores, trace counts, and token usage over time in the Opik Dashboard.
- Utilize Online Evaluation Rules with LLM-as-a-Judge metrics to identify production issues.
- Leverage Opik Agent Optimizer and Opik Guardrails to continuously improve and secure your LLM applications in production.
[!TIP] If you are looking for features that Opik doesn't have today, please raise a new Feature request 🚀
Get your Opik server running in minutes. Choose the option that best suits your needs:
Access Opik instantly without any setup. Ideal for quick starts and hassle-free maintenance.
👉 Create your free Comet account
Deploy Opik in your own environment. Choose between Docker for local setups or Kubernetes for scalability.
This is the simplest way to get a local Opik instance running. Note the new ./opik.sh
installation script:
On Linux or Mac Enviroment:
# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git
# Navigate to the repository
cd opik
# Start the Opik platform
./opik.sh
On Windows Enviroment:
# Clone the Opik repository
git clone https://github.com/comet-ml/opik.git
# Navigate to the repository
cd opik
# Start the Opik platform
powershell -ExecutionPolicy ByPass -c ".\\opik.ps1"
Use the --help
or --info
options to troubleshoot issues. Dockerfiles now ensure containers run as non-root users for enhanced security. Once all is up and running, you can now visit localhost:5173 on your browser! For detailed instructions, see the Local Deployment Guide.
For production or larger-scale self-hosted deployments, Opik can be installed on a Kubernetes cluster using our Helm chart. Click the badge for the full Kubernetes Installation Guide using Helm.
[!IMPORTANT] Version 1.7.0 Changes: Please check the changelog for important updates and breaking changes.
Opik provides a suite of client libraries and a REST API to interact with the Opik server. This includes SDKs for Python, TypeScript, and Ruby (via OpenTelemetry), allowing for seamless integration into your workflows. For detailed API and SDK references, see the Opik Client Reference Documentation.
To get started with the Python SDK:
Install the package:
# install using pip
pip install opik
# or install with uv
uv pip install opik
Configure the python SDK by running the opik configure
command, which will prompt you for your Opik server address (for self-hosted instances) or your API key and workspace (for Comet.com):
opik configure
[!TIP] You can also call
opik.configure(use_local=True)
from your Python code to configure the SDK to run on a local self-hosted installation, or provide API key and workspace details directly for Comet.com. Refer to the Python SDK documentation for more configuration options.
You are now ready to start logging traces using the Python SDK.
The easiest way to log traces is to use one of our direct integrations. Opik supports a wide array of frameworks, including recent additions like Google ADK, Autogen, AG2, and Flowise AI:
Integration | Description | Documentation | Try in Colab |
---|---|---|---|
ADK | Log traces for Google Agent Development Kit (ADK) | Documentation | |
AG2 | Log traces for AG2 LLM calls | Documentation | (Coming Soon) |
AIsuite | Log traces for aisuite LLM calls | Documentation | |
Agno | Log traces for Agno agent orchestration framework calls | Documentation | (Coming Soon) |
Anthropic | Log traces for Anthropic LLM calls | Documentation | |
Autogen | Log traces for Autogen agentic workflows | Documentation | (Coming Soon) |
Bedrock | Log traces for Amazon Bedrock LLM calls | Documentation | |
BeeAI | Log traces for BeeAI agent framework calls | Documentation | (Coming Soon) |
BytePlus | Log traces for BytePlus LLM calls | Documentation | (Coming Soon) |
Cohere | Log traces for Cohere LLM calls | Documentation | (Coming Soon) |
CrewAI | Log traces for CrewAI calls | Documentation | |
DeepSeek | Log traces for DeepSeek LLM calls | Documentation | (Coming Soon) |
Dify | Log traces for Dify agent runs | Documentation | (Coming Soon) |
DSPY | Log traces for DSPy runs | Documentation | |
Fireworks AI | Log traces for Fireworks AI LLM calls | Documentation | (Coming Soon) |
Flowise AI | Log traces for Flowise AI visual LLM builder | Documentation | (Native UI integration, see documentation) |
Gemini | Log traces for Google Gemini LLM calls | Documentation | |
Groq | Log traces for Groq LLM calls | Documentation | |
Guardrails | Log traces for Guardrails AI validations | Documentation | |
Haystack | Log traces for Haystack calls | Documentation | |
Instructor | Log traces for LLM calls made with Instructor | Documentation | |
LangChain | Log traces for LangChain LLM calls | Documentation | |
LangChainJS | Log traces for LangChainJS LLM calls | Documentation | (Coming Soon) |
LangGraph | Log traces for LangGraph executions | Documentation | |
LiteLLM | Log traces for LiteLLM model calls | Documentation | |
LiveKit Agents | Log traces for LiveKit Agents AI agent framework calls | Documentation | (Coming Soon) |
LlamaIndex | Log traces for LlamaIndex LLM calls | Documentation | |
Mastra | Log traces for Mastra AI workflow framework calls | Documentation | (Coming Soon) |
Mistral AI | Log traces for Mistral AI LLM calls | Documentation | (Coming Soon) |
Novita AI | Log traces for Novita AI LLM calls | Documentation | (Coming Soon) |
Ollama | Log traces for Ollama LLM calls | Documentation | |
OpenAI | Log traces for OpenAI LLM calls | Documentation | |
OpenAI Agents | Log traces for OpenAI Agents SDK calls | Documentation | |
OpenRouter | Log traces for OpenRouter LLM calls | Documentation | (Coming Soon) |
OpenTelemetry | Log traces for OpenTelemetry supported calls | Documentation | (Coming Soon) |
Predibase | Log traces for Predibase LLM calls | Documentation | |
Pydantic AI | Log traces for PydanticAI agent calls | Documentation | |
Ragas | Log traces for Ragas evaluations | Documentation | |
Semantic Kernel | Log traces for Microsoft Semantic Kernel calls | Documentation | (Coming Soon) |
Smolagents | Log traces for Smolagents agents | Documentation | |
Spring AI | Log traces for Spring AI framework calls | Documentation | (Coming Soon) |
Strands Agents | Log traces for Strands agents calls | Documentation | (Coming Soon) |
Together AI | Log traces for Together AI LLM calls | Documentation | (Coming Soon) |
Vercel AI SDK | Log traces for Vercel AI SDK calls | Documentation | (Coming Soon) |
VoltAgent | Log traces for VoltAgent agent framework calls | Documentation | (Coming Soon) |
WatsonX | Log traces for IBM watsonx LLM calls | Documentation | |
xAI Grok | Log traces for xAI Grok LLM calls | Documentation | (Coming Soon) |
[!TIP] If the framework you are using is not listed above, feel free to open an issue or submit a PR with the integration.
If you are not using any of the frameworks above, you can also use the track
function decorator to log traces:
import opik
opik.configure(use_local=True) # Run locally
@opik.track
def my_llm_function(user_question: str) -> str:
# Your LLM code here
return "Hello"
[!TIP] The track decorator can be used in conjunction with any of our integrations and can also be used to track nested function calls.
The Python Opik SDK includes a number of LLM as a judge metrics to help you evaluate your LLM application. Learn more about it in the metrics documentation.
To use them, simply import the relevant metric and use the score
function:
from opik.evaluation.metrics import Hallucination
metric = Hallucination()
score = metric.score(
input="What is the capital of France?",
output="Paris",
context=["France is a country in Europe."]
)
print(score)
Opik also includes a number of pre-built heuristic metrics as well as the ability to create your own. Learn more about it in the metrics documentation.
Opik allows you to evaluate your LLM application during development through Datasets and Experiments. The Opik Dashboard offers enhanced charts for experiments and better handling of large traces. You can also run evaluations as part of your CI/CD pipeline using our PyTest integration.
If you find Opik useful, please consider giving us a star! Your support helps us grow our community and continue improving the product.
There are many ways to contribute to Opik:
- Submit bug reports and feature requests
- Review the documentation and submit Pull Requests to improve it
- Speaking or writing about Opik and letting us know
- Upvoting popular feature requests to show your support
To learn more about how to contribute to Opik, please see our contributing guidelines.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for opik
Similar Open Source Tools

opik
Comet Opik is a repository containing two main services: a frontend and a backend. It provides a Python SDK for easy installation. Users can run the full application locally with minikube, following specific installation prerequisites. The repository structure includes directories for applications like Opik backend, with detailed instructions available in the README files. Users can manage the installation using simple k8s commands and interact with the application via URLs for checking the running application and API documentation. The repository aims to facilitate local development and testing of Opik using Kubernetes technology.

auto-dev
AutoDev is an AI-powered coding wizard that supports multiple languages, including Java, Kotlin, JavaScript/TypeScript, Rust, Python, Golang, C/C++/OC, and more. It offers a range of features, including auto development mode, copilot mode, chat with AI, customization options, SDLC support, custom AI agent integration, and language features such as language support, extensions, and a DevIns language for AI agent development. AutoDev is designed to assist developers with tasks such as auto code generation, bug detection, code explanation, exception tracing, commit message generation, code review content generation, smart refactoring, Dockerfile generation, CI/CD config file generation, and custom shell/command generation. It also provides a built-in LLM fine-tune model and supports UnitEval for LLM result evaluation and UnitGen for code-LLM fine-tune data generation.

llm4ad
LLM4AD is an open-source Python-based platform leveraging Large Language Models (LLMs) for Automatic Algorithm Design (AD). It provides unified interfaces for methods, tasks, and LLMs, along with features like evaluation acceleration, secure evaluation, logs, GUI support, and more. The platform was originally developed for optimization tasks but is versatile enough to be used in other areas such as machine learning, science discovery, game theory, and engineering design. It offers various search methods and algorithm design tasks across different domains. LLM4AD supports remote LLM API, local HuggingFace LLM deployment, and custom LLM interfaces. The project is licensed under the MIT License and welcomes contributions, collaborations, and issue reports.

langchain_dart
LangChain.dart is a Dart port of the popular LangChain Python framework created by Harrison Chase. LangChain provides a set of ready-to-use components for working with language models and a standard interface for chaining them together to formulate more advanced use cases (e.g. chatbots, Q&A with RAG, agents, summarization, extraction, etc.). The components can be grouped into a few core modules: * **Model I/O:** LangChain offers a unified API for interacting with various LLM providers (e.g. OpenAI, Google, Mistral, Ollama, etc.), allowing developers to switch between them with ease. Additionally, it provides tools for managing model inputs (prompt templates and example selectors) and parsing the resulting model outputs (output parsers). * **Retrieval:** assists in loading user data (via document loaders), transforming it (with text splitters), extracting its meaning (using embedding models), storing (in vector stores) and retrieving it (through retrievers) so that it can be used to ground the model's responses (i.e. Retrieval-Augmented Generation or RAG). * **Agents:** "bots" that leverage LLMs to make informed decisions about which available tools (such as web search, calculators, database lookup, etc.) to use to accomplish the designated task. The different components can be composed together using the LangChain Expression Language (LCEL).

pr-agent
PR-Agent is a tool that helps to efficiently review and handle pull requests by providing AI feedbacks and suggestions. It supports various commands such as generating PR descriptions, providing code suggestions, answering questions about the PR, and updating the CHANGELOG.md file. PR-Agent can be used via CLI, GitHub Action, GitHub App, Docker, and supports multiple git providers and models. It emphasizes real-life practical usage, with each tool having a single GPT-4 call for quick and affordable responses. The PR Compression strategy enables effective handling of both short and long PRs, while the JSON prompting strategy allows for modular and customizable tools. PR-Agent Pro, the hosted version by CodiumAI, provides additional benefits such as full management, improved privacy, priority support, and extra features.

openkore
OpenKore is a custom client and intelligent automated assistant for Ragnarok Online. It is a free, open source, and cross-platform program (Linux, Windows, and MacOS are supported). To run OpenKore, you need to download and extract it or clone the repository using Git. Configure OpenKore according to the documentation and run openkore.pl to start. The tool provides a FAQ section for troubleshooting, guidelines for reporting issues, and information about botting status on official servers. OpenKore is developed by a global team, and contributions are welcome through pull requests. Various community resources are available for support and communication. Users are advised to comply with the GNU General Public License when using and distributing the software.

LitServe
LitServe is a high-throughput serving engine designed for deploying AI models at scale. It generates an API endpoint for models, handles batching, streaming, and autoscaling across CPU/GPUs. LitServe is built for enterprise scale with a focus on minimal, hackable code-base without bloat. It supports various model types like LLMs, vision, time-series, and works with frameworks like PyTorch, JAX, Tensorflow, and more. The tool allows users to focus on model performance rather than serving boilerplate, providing full control and flexibility.

EasyEdit
EasyEdit is a Python package for edit Large Language Models (LLM) like `GPT-J`, `Llama`, `GPT-NEO`, `GPT2`, `T5`(support models from **1B** to **65B**), the objective of which is to alter the behavior of LLMs efficiently within a specific domain without negatively impacting performance across other inputs. It is designed to be easy to use and easy to extend.

pr-agent
PR-Agent is a tool designed to assist in efficiently reviewing and handling pull requests by providing AI feedback and suggestions. It offers various tools such as Review, Describe, Improve, Ask, Update CHANGELOG, and more, with the ability to run them via different interfaces like CLI, PR Comments, or automatically triggering them when a new PR is opened. The tool supports multiple git platforms and models, emphasizing real-life practical usage and modular, customizable tools.

agentql
AgentQL is a suite of tools for extracting data and automating workflows on live web sites featuring an AI-powered query language, Python and JavaScript SDKs, a browser-based debugger, and a REST API endpoint. It uses natural language queries to pinpoint data and elements on any web page, including authenticated and dynamically generated content. Users can define structured data output and apply transforms within queries. AgentQL's natural language selectors find elements intuitively based on the content of the web page and work across similar web sites, self-healing as UI changes over time.

IDvs.MoRec
This repository contains the source code for the SIGIR 2023 paper 'Where to Go Next for Recommender Systems? ID- vs. Modality-based Recommender Models Revisited'. It provides resources for evaluating foundation, transferable, multi-modal, and LLM recommendation models, along with datasets, pre-trained models, and training strategies for IDRec and MoRec using in-batch debiased cross-entropy loss. The repository also offers large-scale datasets, code for SASRec with in-batch debias cross-entropy loss, and information on joining the lab for research opportunities.

chat-your-doc
Chat Your Doc is an experimental project exploring various applications based on LLM technology. It goes beyond being just a chatbot project, focusing on researching LLM applications using tools like LangChain and LlamaIndex. The project delves into UX, computer vision, and offers a range of examples in the 'Lab Apps' section. It includes links to different apps, descriptions, launch commands, and demos, aiming to showcase the versatility and potential of LLM applications.

DeepRetrieval
DeepRetrieval is a tool designed to enhance search engines and retrievers using Large Language Models (LLMs) and Reinforcement Learning (RL). It allows LLMs to learn how to search effectively by integrating with search engine APIs and customizing reward functions. The tool provides functionalities for data preparation, training, evaluation, and monitoring search performance. DeepRetrieval aims to improve information retrieval tasks by leveraging advanced AI techniques.

unoplat-code-confluence
Unoplat-CodeConfluence is a universal code context engine that aims to extract, understand, and provide precise code context across repositories tied through domains. It combines deterministic code grammar with state-of-the-art LLM pipelines to achieve human-like understanding of codebases in minutes. The tool offers smart summarization, graph-based embedding, enhanced onboarding, graph-based intelligence, deep dependency insights, and seamless integration with existing development tools and workflows. It provides a precise context API for knowledge engine and AI coding assistants, enabling reliable code understanding through bottom-up code summarization, graph-based querying, and deep package and dependency analysis.

AI-For-Beginners
AI-For-Beginners is a comprehensive 12-week, 24-lesson curriculum designed by experts at Microsoft to introduce beginners to the world of Artificial Intelligence (AI). The curriculum covers various topics such as Symbolic AI, Neural Networks, Computer Vision, Natural Language Processing, Genetic Algorithms, and Multi-Agent Systems. It includes hands-on lessons, quizzes, and labs using popular frameworks like TensorFlow and PyTorch. The focus is on providing a foundational understanding of AI concepts and principles, making it an ideal starting point for individuals interested in AI.

dl_model_infer
This project is a c++ version of the AI reasoning library that supports the reasoning of tensorrt models. It provides accelerated deployment cases of deep learning CV popular models and supports dynamic-batch image processing, inference, decode, and NMS. The project has been updated with various models and provides tutorials for model exports. It also includes a producer-consumer inference model for specific tasks. The project directory includes implementations for model inference applications, backend reasoning classes, post-processing, pre-processing, and target detection and tracking. Speed tests have been conducted on various models, and onnx downloads are available for different models.
For similar tasks

opik
Comet Opik is a repository containing two main services: a frontend and a backend. It provides a Python SDK for easy installation. Users can run the full application locally with minikube, following specific installation prerequisites. The repository structure includes directories for applications like Opik backend, with detailed instructions available in the README files. Users can manage the installation using simple k8s commands and interact with the application via URLs for checking the running application and API documentation. The repository aims to facilitate local development and testing of Opik using Kubernetes technology.
For similar jobs

AirGo
AirGo is a front and rear end separation, multi user, multi protocol proxy service management system, simple and easy to use. It supports vless, vmess, shadowsocks, and hysteria2.

mosec
Mosec is a high-performance and flexible model serving framework for building ML model-enabled backend and microservices. It bridges the gap between any machine learning models you just trained and the efficient online service API. * **Highly performant** : web layer and task coordination built with Rust 🦀, which offers blazing speed in addition to efficient CPU utilization powered by async I/O * **Ease of use** : user interface purely in Python 🐍, by which users can serve their models in an ML framework-agnostic manner using the same code as they do for offline testing * **Dynamic batching** : aggregate requests from different users for batched inference and distribute results back * **Pipelined stages** : spawn multiple processes for pipelined stages to handle CPU/GPU/IO mixed workloads * **Cloud friendly** : designed to run in the cloud, with the model warmup, graceful shutdown, and Prometheus monitoring metrics, easily managed by Kubernetes or any container orchestration systems * **Do one thing well** : focus on the online serving part, users can pay attention to the model optimization and business logic

llm-code-interpreter
The 'llm-code-interpreter' repository is a deprecated plugin that provides a code interpreter on steroids for ChatGPT by E2B. It gives ChatGPT access to a sandboxed cloud environment with capabilities like running any code, accessing Linux OS, installing programs, using filesystem, running processes, and accessing the internet. The plugin exposes commands to run shell commands, read files, and write files, enabling various possibilities such as running different languages, installing programs, starting servers, deploying websites, and more. It is powered by the E2B API and is designed for agents to freely experiment within a sandboxed environment.

pezzo
Pezzo is a fully cloud-native and open-source LLMOps platform that allows users to observe and monitor AI operations, troubleshoot issues, save costs and latency, collaborate, manage prompts, and deliver AI changes instantly. It supports various clients for prompt management, observability, and caching. Users can run the full Pezzo stack locally using Docker Compose, with prerequisites including Node.js 18+, Docker, and a GraphQL Language Feature Support VSCode Extension. Contributions are welcome, and the source code is available under the Apache 2.0 License.

learn-generative-ai
Learn Cloud Applied Generative AI Engineering (GenEng) is a course focusing on the application of generative AI technologies in various industries. The course covers topics such as the economic impact of generative AI, the role of developers in adopting and integrating generative AI technologies, and the future trends in generative AI. Students will learn about tools like OpenAI API, LangChain, and Pinecone, and how to build and deploy Large Language Models (LLMs) for different applications. The course also explores the convergence of generative AI with Web 3.0 and its potential implications for decentralized intelligence.

gcloud-aio
This repository contains shared codebase for two projects: gcloud-aio and gcloud-rest. gcloud-aio is built for Python 3's asyncio, while gcloud-rest is a threadsafe requests-based implementation. It provides clients for Google Cloud services like Auth, BigQuery, Datastore, KMS, PubSub, Storage, and Task Queue. Users can install the library using pip and refer to the documentation for usage details. Developers can contribute to the project by following the contribution guide.

fluid
Fluid is an open source Kubernetes-native Distributed Dataset Orchestrator and Accelerator for data-intensive applications, such as big data and AI applications. It implements dataset abstraction, scalable cache runtime, automated data operations, elasticity and scheduling, and is runtime platform agnostic. Key concepts include Dataset and Runtime. Prerequisites include Kubernetes version > 1.16, Golang 1.18+, and Helm 3. The tool offers features like accelerating remote file accessing, machine learning, accelerating PVC, preloading dataset, and on-the-fly dataset cache scaling. Contributions are welcomed, and the project is under the Apache 2.0 license with a vendor-neutral approach.

aiges
AIGES is a core component of the Athena Serving Framework, designed as a universal encapsulation tool for AI developers to deploy AI algorithm models and engines quickly. By integrating AIGES, you can deploy AI algorithm models and engines rapidly and host them on the Athena Serving Framework, utilizing supporting auxiliary systems for networking, distribution strategies, data processing, etc. The Athena Serving Framework aims to accelerate the cloud service of AI algorithm models and engines, providing multiple guarantees for cloud service stability through cloud-native architecture. You can efficiently and securely deploy, upgrade, scale, operate, and monitor models and engines without focusing on underlying infrastructure and service-related development, governance, and operations.