deepchecks

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

Stars: 3593

Visit

Deepchecks is a holistic open-source solution for AI & ML validation needs, enabling thorough testing of data and models from research to production. It includes components for testing, CI & testing management, and monitoring. Users can install and use Deepchecks for testing and monitoring their AI models, with customizable checks and suites for tabular, NLP, and computer vision data. The tool provides visual reports, pythonic/json output for processing, and a dynamic UI for collaboration and monitoring. Deepchecks is open source, with premium features available under a commercial license for monitoring components.

README:

Deepchecks - Continuous Validation for AI & ML: Testing, CI & Monitoring

Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling you to thoroughly test your data and models from research to production.

👋 Join Slack | 📖 Documentation | 🌐 Blog | 🐦 Twitter

🧩 Components

Deepchecks includes:

Deepchecks Testing (Quickstart, docs):
- Running built-in & your own custom Checks and Suites for Tabular, NLP & CV validation (open source).
CI & Testing Management (Quickstart, docs):
- Collaborating over test results and iterating efficiently until model is production-ready and can be deployed (open source & managed offering).
Deepchecks Monitoring (Quickstart, docs):
- Tracking and validating your deployed models behavior when in production (open source & managed offering).

This repo is our main repo as all components use the deepchecks checks in their core. See the Getting Started section for more information about installation and quickstarts for each of the components. If you want to see deepchecks monitoring's code, you can check out the deepchecks/monitoring repo.

⏩ Getting Started

💻 Installation

Deepchecks Testing (and CI) Installation

pip install deepchecks -U --user

For installing the nlp / vision submodules or with conda:

For NLP: Replace deepchecks with "deepchecks[nlp]", and optionally install alsodeepchecks[nlp-properties]
For Computer Vision: Replace deepchecks with "deepchecks[vision]".
For installing with conda, similarly use: conda install -c conda-forge deepchecks.

Check out the full installation instructions for deepchecks testing here.

Deepchecks Monitoring Installation

To use deepchecks for production monitoring, you can either use our SaaS service, or deploy a local instance in one line on Linux/MacOS (Windows is WIP!) with Docker. Create a new directory for the installation files, open a terminal within that directory and run the following:

pip install deepchecks-installer
deepchecks-installer install-monitoring

This will automatically download the necessary dependencies, run the installation process and then start the application locally.

The installation will take a few minutes. Then you can open the deployment url (default is http://localhost), and start the system onboarding. Check out the full monitoring open source installation & quickstart.

Note that the open source product is built such that each deployment supports monitoring of a single model.

🏃‍♀️ Quickstarts

Deepchecks Testing Quickstart

Jump right into the respective quickstart docs:

to have it up and running on your data.

Inside the quickstarts, you'll see how to create the relevant deepchecks object for holding your data and metadata (Dataset, TextData or VisionData, corresponding to the data type), and run a Suite or Check. The code snippet for running it will look something like the following, depending on the chosen Suite or Check.

from deepchecks.tabular.suites import model_evaluation
suite = model_evaluation()
suite_result = suite.run(train_dataset=train_dataset, test_dataset=test_dataset, model=model)
suite_result.save_as_html() # replace this with suite_result.show() or suite_result.show_in_window() to see results inline or in window
# or suite_result.results[0].value with the relevant check index to process the check result's values in python

The output will be a report that enables you to inspect the status and results of the chosen checks:

Deepchecks Monitoring Quickstart

Jump right into the open source monitoring quickstart docs to have it up and running on your data. You'll then be able to see the checks results over time, set alerts, and interact with the dynamic deepchecks UI that looks like this:

Deepchecks CI & Testing Management Quickstart

Deepchecks managed CI & Testing management is currently in closed preview. Book a demo for more information about the offering.

For building and maintaining your own CI process while utilizing Deepchecks Testing for it, check out our docs for Using Deepchecks in CI/CD.

🧮 How does it work?

At its core, deepchecks includes a wide variety of built-in Checks, for testing all types of data and model related issues. These checks are implemented for various models and data types (Tabular, NLP, Vision), and can easily be customized and expanded.

The check results can be used to automatically make informed decisions about your model's production-readiness, and for monitoring it over time in production. The check results can be examined with visual reports (by saving them to an HTML file, or seeing them in Jupyter), processed with code (using their pythonic / json output), and inspected and collaborated on with Deepchecks' dynamic UI (for examining test results and for production monitoring).

✅ Deepchecks' Core: The Checks

All of the Checks and the framework for customizing them are implemented inside the Deepchecks Testing Python package (this repo).
Each check tests for a specific potential problem. Deepchecks has many pre-implemented checks for finding issues with the model's performance (e.g. identifying weak segments), data distribution (e.g. detect drifts or leakages) and data integrity (e.g. find conflicting labels).
Customizable: each check has many configurable parameters, and custom checks can easily be implemented.
Can be run manually (during research) or triggered automatically (in CI processes or production monitoring)
Check results can be consumed by:
- Visual output report - Saving to HTML(result.save_to_html('output_report_name.html')) or viewing them in Jupyter (result.show()).
- Processing with code - with python using the check result's value attribute, or saving a JSON output
- Deepchecks' UI - for dynamic inspection and collaboration (of test results and production monitoring)
Optional conditions can be added and customized, to automatically validate check results, with a a pass ✓, fail ✖ or warning ! status
An ordered list of checks (with optional conditions) can be run together in a "Suite" (and the output is a concluding report of all checks that ran)

📜 Open Source vs Paid

Deepchecks' projects (deepchecks/deepchecks & deepchecks/monitoring) are open source and are released under AGPL 3.0.

The only exception are the Deepchecks Monitoring components (in the deepchecks/monitoring repo), that are under the (backend/deepchecks_monitoring/ee) directory, that are subject to a commercial license (see the license here). That directory isn't used by default, and is packaged as part of the deepchecks monitoring repository simply to support upgrading to the commercial edition without downtime.

Enabling premium features (contained in the backend/deepchecks_monitoring/ee directory) with a self-hosted instance requires a Deepchecks license. To learn more, book a demo or see our pricing page.

Looking for a 💯% open-source solution for deepcheck monitoring? Check out the Monitoring OSS repository, which is purged of all proprietary code and features.

👭 Community, Contributing, Docs & Support

Deepchecks is an open source solution. We are committed to a transparent development process and highly appreciate any contributions. Whether you are helping us fix bugs, propose new features, improve our documentation or spread the word, we would love to have you as part of our community.

Give us a ⭐️ github star ⭐️ on the top of this page to support what we're doing, it means a lot for open source projects!
Read our docs for more info about how to use and customize deepchecks, and for step-by-step tutorials.
Post a Github Issue to submit a bug report, feature request, or suggest an improvement.
To contribute to the package, check out our first good issues and contribution guidelines, and open a PR.

Join our Slack to give us feedback, connect with the maintainers and fellow users, ask questions, get help for package usage or contributions, or engage in discussions about ML testing!

✨ Contributors

Thanks goes to these wonderful people (emoji key):

_{Itay Gabbay} 💻 📖 🤔	_matanper 📖 🤔 💻	_JKL98ISR 🤔 💻 📖	_{Yurii Romanyshyn} 🤔 💻 📖	_{Noam Bressler} 💻 📖 🤔	_{Nir Hutnik} 💻 📖 🤔	_Nadav-Barak 💻 📖 🤔
_Sol 💻 📖 🤔	_DanArlowski 💻 🚇	_DBI 💻	_OrlyShmorly 🎨	_shir22 🤔 📖 📢	_yaronzo1 🤔 🖋	_ptannor 🤔 🖋
_avitzd 📋 📹	_DanBasson 📖 🐛 💡	_S.Kishore 💻 📖 🐛	_{Shay Palachy-Affek} 🔣 💡 📓	_{Cemal GURPINAR} 📖 🐛	_{David de la Iglesia Castro} 💻	_{Levi Bard} 📖
_{Julien Schuermans} 🐛	_{Nir Ben-Zvi} 💻 🤔	_{Shiv Shankar Dayal} 🚇	_RonItay 🐛 💻	_{Jeroen Van Goey} 🐛 📖	_idow09 🐛 💡	_{Ikko Ashimine} 📖
_{Jason Wohlgemuth} 📖	_{Lokin Sethia} 💻 🐛	_{Ingo Marquart} 💻 🐛	_Oscar 💻	_{Richard W} 💻 📖 🤔	_Bernardo 💻 📖	_{Olivier Binette} 💻 📖 🤔
_陈鼎彦 🐛	_{Andres Vargas} 📖	_{Michael Marien} 📖 🐛	_OrdoAbChao 💻	_{Matt Chan} 💻	_{Harsh Jain} 💻 📖 🐛	_arterm-sedov 📖
_{AIT ALI YAHIA Rayane} 💻 🤔

This project follows the all-contributors specification. Contributions of any kind are welcome!

For Tasks:

Click tags to check more tools for each tasks

validate models monitor production test data manage ci collaborate on results

For Jobs:

data scientist machine learning engineer ai researcher data analyst software developer

Alternative AI tools for deepchecks

Similar Open Source Tools

deepchecks

github

: 3.6k

marimo

Marimo is a reactive Python notebook that ensures code and outputs consistency by automatically running dependent cells or marking them as stale. It replaces various tools like Jupyter, streamlit, and more, offering an interactive environment with features like binding UI elements to Python, reproducibility, executability as scripts or apps, shareability, and designed for data tasks. It is git-friendly, offers a modern editor with AI assistants, and comes with built-in package management. Marimo provides deterministic execution order, dynamic markdown and SQL capabilities, and a performant runtime. It is easy to get started with and suitable for both beginners and power users.

github

: 12.1k

cognee

Cognee is an open-source framework designed for creating self-improving deterministic outputs for Large Language Models (LLMs) using graphs, LLMs, and vector retrieval. It provides a platform for AI engineers to enhance their models and generate more accurate results. Users can leverage Cognee to add new information, utilize LLMs for knowledge creation, and query the system for relevant knowledge. The tool supports various LLM providers and offers flexibility in adding different data types, such as text files or directories. Cognee aims to streamline the process of working with LLMs and improving AI models for better performance and efficiency.

github

: 1.8k

llm-interface

LLM Interface is an npm module that streamlines interactions with various Large Language Model (LLM) providers in Node.js applications. It offers a unified interface for switching between providers and models, supporting 36 providers and hundreds of models. Features include chat completion, streaming, error handling, extensibility, response caching, retries, JSON output, and repair. The package relies on npm packages like axios, @google/generative-ai, dotenv, jsonrepair, and loglevel. Installation is done via npm, and usage involves sending prompts to LLM providers. Tests can be run using npm test. Contributions are welcome under the MIT License.

github

: 92

AgentGPT

AgentGPT is a platform that allows users to configure and deploy autonomous AI agents. Users can name their own custom AI and set it on any goal. The AI will think of tasks, execute them, and learn from the results to reach the goal. The platform provides a demo experience, automatic setup CLI, and a tech stack including Next.js, FastAPI, Prisma, TailwindCSS, Zod, and more. AgentGPT is designed to help users easily create and deploy AI agents for various tasks.

github

: 30.0k

db2rest

DB2Rest is a modern low code REST DATA API platform that enables the rapid development of intelligent applications by combining databases, language models, and vector stores. It facilitates context-aware, reasoning applications without vendor lock-in. The tool accelerates application delivery, fosters faster innovation with AI, serves as a secure database gateway, and simplifies integration. It supports various databases like PostgreSQL, MySQL, MS SQL Server, Oracle, MongoDB, and more, with planned support for additional databases. Users can connect on Discord for support and contact [email protected] for inquiries.

github

: 320

SuperCoder

SuperCoder is an open-source autonomous software development system that leverages advanced AI tools and agents to streamline and automate coding, testing, and deployment tasks, enhancing efficiency and reliability. It supports a variety of languages and frameworks for diverse development needs. Users can set up the environment variables, build and run the Go server, Asynq worker, and Postgres using Docker and Docker Compose. The project is under active development and may still have issues, but users can seek help and support from the Discord community or by creating new issues on GitHub.

github

: 484

verl

veRL is a flexible and efficient reinforcement learning training framework designed for large language models (LLMs). It allows easy extension of diverse RL algorithms, seamless integration with existing LLM infrastructures, and flexible device mapping. The framework achieves state-of-the-art throughput and efficient actor model resharding with 3D-HybridEngine. It supports popular HuggingFace models and is suitable for users working with PyTorch FSDP, Megatron-LM, and vLLM backends.

github

: 6.2k

GPTSwarm

GPTSwarm is a graph-based framework for LLM-based agents that enables the creation of LLM-based agents from graphs and facilitates the customized and automatic self-organization of agent swarms with self-improvement capabilities. The library includes components for domain-specific operations, graph-related functions, LLM backend selection, memory management, and optimization algorithms to enhance agent performance and swarm efficiency. Users can quickly run predefined swarms or utilize tools like the file analyzer. GPTSwarm supports local LM inference via LM Studio, allowing users to run with a local LLM model. The framework has been accepted by ICML2024 and offers advanced features for experimentation and customization.

github

: 460

fast-llm-security-guardrails

ZenGuard AI enables AI developers to integrate production-level, low-code LLM (Large Language Model) guardrails into their generative AI applications effortlessly. With ZenGuard AI, ensure your application operates within trusted boundaries, is protected from prompt injections, and maintains user privacy without compromising on performance.

github

: 93

Imagine_AI

IMAGINE - AI is a groundbreaking image generator tool that leverages the power of OpenAI's DALL-E 2 API library to create extraordinary visuals. Developed using Node.js and Express, this tool offers a transformative way to unleash artistic creativity and imagination by generating unique and captivating images through simple prompts or keywords.

github

: 51

openkf

OpenKF (Open Knowledge Flow) is an online intelligent customer service system. It is an open-source customer service system based on OpenIM, supporting LLM (Local Knowledgebase) customer service and multi-channel customer service. It is easy to integrate with third-party systems, deploy, and perform secondary development. The system provides features like login page, config page, dashboard page, platform page, and session page. Users can quickly get started with OpenKF by following the installation and run instructions. The architecture follows MVC design with a standardized directory structure. The community encourages involvement through community meetings, contributions, and development. OpenKF is licensed under the Apache 2.0 license.

github

: 111

AutoAudit

AutoAudit is an open-source large language model specifically designed for the field of network security. It aims to provide powerful natural language processing capabilities for security auditing and network defense, including analyzing malicious code, detecting network attacks, and predicting security vulnerabilities. By coupling AutoAudit with ClamAV, a security scanning platform has been created for practical security audit applications. The tool is intended to assist security professionals with accurate and fast analysis and predictions to combat evolving network threats.

github

: 201

cua

Cua is a tool for creating and running high-performance macOS and Linux virtual machines on Apple Silicon, with built-in support for AI agents. It provides libraries like Lume for running VMs with near-native performance, Computer for interacting with sandboxes, and Agent for running agentic workflows. Users can refer to the documentation for onboarding, explore demos showcasing AI-Gradio and GitHub issue fixing, and utilize accessory libraries like Core, PyLume, Computer Server, and SOM. Contributions are welcome, and the tool is open-sourced under the MIT License.

github

: 3.3k

autoflow

AutoFlow is an open source graph rag based knowledge base tool built on top of TiDB Vector and LlamaIndex and DSPy. It features a Perplexity-style Conversational Search page and an Embeddable JavaScript Snippet for easy integration into websites. The tool allows for comprehensive coverage and streamlined search processes through sitemap URL scraping.

github

: 2.4k

biniou

biniou is a self-hosted webui for various GenAI (generative artificial intelligence) tasks. It allows users to generate multimedia content using AI models and chatbots on their own computer, even without a dedicated GPU. The tool can work offline once deployed and required models are downloaded. It offers a wide range of features for text, image, audio, video, and 3D object generation and modification. Users can easily manage the tool through a control panel within the webui, with support for various operating systems and CUDA optimization. biniou is powered by Huggingface and Gradio, providing a cross-platform solution for AI content generation.

github

: 569

For similar tasks

deepchecks

github

: 3.6k

Model-References

The 'Model-References' repository contains examples for training and inference using Intel Gaudi AI Accelerator. It includes models for computer vision, natural language processing, audio, generative models, MLPerf™ training, and MLPerf™ inference. The repository provides performance data and model validation information for various frameworks like PyTorch. Users can find examples of popular models like ResNet, BERT, and Stable Diffusion optimized for Intel Gaudi AI accelerator.

github

: 138

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675