clearml-serving

ClearML - Model-Serving Orchestration and Repository Solution

Stars: 132

Visit

ClearML Serving is a command line utility for model deployment and orchestration, enabling model deployment including serving and preprocessing code to a Kubernetes cluster or custom container based solution. It supports machine learning models like Scikit Learn, XGBoost, LightGBM, and deep learning models like TensorFlow, PyTorch, ONNX. It provides a customizable RestAPI for serving, online model deployment, scalable solutions, multi-model per container, automatic deployment, canary A/B deployment, model monitoring, usage metric reporting, metric dashboard, and model performance metrics. ClearML Serving is modular, scalable, flexible, customizable, and open source.

README:

ClearML Serving - Model deployment made easy

`clearml-serving v1.3.1` ✨ Model Serving (ML/DL) Made Easy 🎉
🔥 NEW version 1.3 🚀 20% faster !

🌟 ClearML is open-source - Leave a star to support the project! 🌟

clearml-serving is a command line utility for model deployment and orchestration.
It enables model deployment including serving and preprocessing code to a Kubernetes cluster or custom container based solution.

🔥 NEW 🎊 Take it for a spin with a simple `docker-compose` command 🪄 ✨

Features:

Easy to deploy & configure
- Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM)
- Support Deep Learning Models (Tensorflow, PyTorch, ONNX)
- Customizable RestAPI for serving (i.e. allow per model pre/post-processing for easy integration)
Flexible
- On-line model deployment
- On-line endpoint model/version deployment (i.e. no need to take the service down)
- Per model standalone preprocessing and postprocessing python code
Scalable
- Multi model per container
- Multi models per serving service
- Multi-service support (fully seperated multiple serving service running independently)
- Multi cluster support
- Out-of-the-box node auto-scaling based on load/usage
Efficient
- Multi-container resource utilization
- Support for CPU & GPU nodes
- Auto-batching for DL models
Automatic deployment
- Automatic model upgrades w/ canary support
- Programmable API for model deployment
Canary A/B deployment
- Online Canary updates
Model Monitoring
- Usage Metric reporting
- Metric Dashboard
- Model performance metric
- Model performance Dashboard

ClearML Serving Design

ClearML Serving Design Principles

Modular , Scalable , Flexible , Customizable , Open Source

Installation

Prerequisites

ClearML-Server : Model repository, Service Health, Control plane
Kubernetes / Single-instance Machine : Deploying containers
CLI : Configuration & model deployment interface

💅 Initial Setup

Setup your ClearML Server or use the Free tier Hosting
Setup local access (if you haven't already), see instructions here
Install clearml-serving CLI:

pip3 install clearml-serving

Create the Serving Service Controller

clearml-serving create --name "serving example"
The new serving service UID should be printed New Serving Service created: id=aa11bb22aa11bb22

Write down the Serving Service UID
Clone clearml-serving repository

git clone https://github.com/allegroai/clearml-serving.git

Edit the environment variables file (docker/example.env) with your clearml-server credentials and Serving Service UID. For example, you should have something like

cat docker/example.env

  CLEARML_WEB_HOST="https://app.clear.ml"
  CLEARML_API_HOST="https://api.clear.ml"
  CLEARML_FILES_HOST="https://files.clear.ml"
  CLEARML_API_ACCESS_KEY="<access_key_here>"
  CLEARML_API_SECRET_KEY="<secret_key_here>"
  CLEARML_SERVING_TASK_ID="<serving_service_id_here>"

Spin the clearml-serving containers with docker-compose (or if running on Kubernetes use the helm chart)

cd docker && docker-compose --env-file example.env -f docker-compose.yml up

If you need Triton support (keras/pytorch/onnx etc.), use the triton docker-compose file

cd docker && docker-compose --env-file example.env -f docker-compose-triton.yml up

💪 If running on a GPU instance w/ Triton support (keras/pytorch/onnx etc.), use the triton gpu docker-compose file

cd docker && docker-compose --env-file example.env -f docker-compose-triton-gpu.yml up

Notice: Any model that registers with "Triton" engine, will run the pre/post processing code on the Inference service container, and the model inference itself will be executed on the Triton Engine container.

🌊 Optional: advanced setup - S3/GS/Azure access

To add access credentials and allow the inference containers to download models from your S3/GS/Azure object-storage, add the respective environment variables to your env files (example.env) See further details on configuring the storage access here

AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_DEFAULT_REGION

GOOGLE_APPLICATION_CREDENTIALS

AZURE_STORAGE_ACCOUNT
AZURE_STORAGE_KEY

💁 Concepts

CLI - Secure configuration interface for on-line model upgrade/deployment on running Serving Services

Serving Service Task - Control plane object storing configuration on all the endpoints. Support multiple separated instance, deployed on multiple clusters.

Inference Services - Inference containers, performing model serving pre/post processing. Also support CPU model inferencing.

Serving Engine Services - Inference engine containers (e.g. Nvidia Triton, TorchServe etc.) used by the Inference Services for heavier model inference.

Statistics Service - Single instance per Serving Service collecting and broadcasting model serving & performance statistics

Time-series DB - Statistics collection service used by the Statistics Service, e.g. Prometheus

Dashboards - Customizable dashboard-ing solution on top of the collected statistics, e.g. Grafana

👉 Toy model (scikit learn) deployment example

Train toy scikit-learn model

create new python virtual environment
pip3 install -r examples/sklearn/requirements.txt
python3 examples/sklearn/train_model.py
Model was automatically registered and uploaded into the model repository. For Manual model registration see here

clearml-serving --id <service_id> model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --name "train sklearn model" --project "serving examples"
Notice the preprocessing python code is packaged and uploaded to the "Serving Service", to be used by any inference container, and downloaded in realtime when updated

Spin the Inference Container

Customize container Dockerfile if needed
Build container docker build --tag clearml-serving-inference:latest -f clearml_serving/serving/Dockerfile .
Spin the inference container: docker run -v ~/clearml.conf:/root/clearml.conf -p 8080:8080 -e CLEARML_SERVING_TASK_ID=<service_id> -e CLEARML_SERVING_POLL_FREQ=5 clearml-serving-inference:latest

Test new model inference endpoint

curl -X POST "http://127.0.0.1:8080/serve/test_model_sklearn" -H "accept: application/json" -H "Content-Type: application/json" -d '{"x0": 1, "x1": 2}'

Notice, now that we have an inference container running, we can add new model inference endpoints directly with the CLI. The inference container will automatically sync once every 5 minutes.

Notice On the first few requests the inference container needs to download the model file and preprocessing python code, this means the request might take a little longer, once everything is cached, it will return almost immediately.

Notes:

Review the model repository in the ClearML web UI, under the "serving examples" Project on your ClearML account/server (free hosted or self-deployed).

Inference services status, console outputs and machine metrics are available in the ClearML UI in the Serving Service project (default: "DevOps" project)

To learn more on training models and the ClearML model repository, see the ClearML documentation

🐢 Registering & Deploying new models manually

Uploading an existing model file into the model repository can be done via the clearml RestAPI, the python interface, or with the clearml-serving CLI.

To learn more on training models and the ClearML model repository, see the ClearML documentation

local model file on our laptop: 'examples/sklearn/sklearn-model.pkl'
Upload the model file to the clearml-server file storage and register it clearml-serving --id <service_id> model upload --name "manual sklearn model" --project "serving examples" --framework "scikit-learn" --path examples/sklearn/sklearn-model.pkl
We now have a new Model in the "serving examples" project, by the name of "manual sklearn model". The CLI output prints the UID of the newly created model, we will use it to register a new endpoint
In the clearml web UI we can see the new model listed under the Models tab in the associated project. we can also download the model file itself directly from the web UI
Register a new endpoint with the new model clearml-serving --id <service_id> model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --model-id <newly_created_model_id_here>

Notice we can also provide a differnt storage destination for the model, such as S3/GS/Azure, by passing --destination="s3://bucket/folder", gs://bucket/folder, azure://bucket/folder. Yhere is no need to provide a unique path tp the destination argument, the location of the model will be a unique path based on the serving service ID and the model name

🐰 Automatic model deployment

The clearml Serving Service support automatic model deployment and upgrades, directly connected with the model repository and API. When the model auto-deploy is configured, a new model versions will be automatically deployed when you "publish" or "tag" a new model in the clearml model repository. This automation interface allows for simpler CI/CD model deployment process, as a single API automatically deploy (or remove) a model from the Serving Service.

💡 Automatic model deployment example

Configure the model auto-update on the Serving Service

clearml-serving --id <service_id> model auto-update --engine sklearn --endpoint "test_model_sklearn_auto" --preprocess "preprocess.py" --name "train sklearn model" --project "serving examples" --max-versions 2

Deploy the Inference container (if not already deployed)
Publish a new model the model repository

Go to the "serving examples" project in the ClearML web UI, click on the Models Tab, search for "train sklearn model" right click and select "Publish"
Use the RestAPI details
Use Python interface:

from clearml import Model
Model(model_id="unique_model_id_here").publish()

The new model is available on a new endpoint version (1), test with: curl -X POST "http://127.0.0.1:8080/serve/test_model_sklearn_auto/1" -H "accept: application/json" -H "Content-Type: application/json" -d '{"x0": 1, "x1": 2}'

🐦 Canary endpoint setup

Canary endpoint deployment add a new endpoint where the actual request is sent to a preconfigured set of endpoints with pre-provided distribution. For example, let's create a new endpoint "test_model_sklearn_canary", we can provide a list of endpoints and probabilities (weights).

clearml-serving --id <service_id> model canary --endpoint "test_model_sklearn_canary" --weights 0.1 0.9 --input-endpoints test_model_sklearn/2 test_model_sklearn/1

This means that any request coming to /test_model_sklearn_canary/ will be routed with probability of 90% to /test_model_sklearn/1/ and with probability of 10% to /test_model_sklearn/2/.

Note:

As with any other Serving Service configuration, we can configure the Canary endpoint while the Inference containers are already running and deployed, they will get updated in their next update cycle (default: once every 5 minutes)

We can also prepare a "fixed" canary endpoint, always splitting the load between the last two deployed models:

clearml-serving --id <service_id> model canary --endpoint "test_model_sklearn_canary" --weights 0.1 0.9 --input-endpoints-prefix test_model_sklearn/

This means that is we have two model inference endpoints: /test_model_sklearn/1/ and /test_model_sklearn/2/. The 10% probability (weight 0.1) will match the last (order by version number) endpoint, i.e. /test_model_sklearn/2/ and the 90% will match /test_model_sklearn/2/. When we add a new model endpoint version, e.g. /test_model_sklearn/3/, the canary distribution will automatically match the 90% probability to /test_model_sklearn/2/ and the 10% to the new endpoint /test_model_sklearn/3/.

Example:

Add two endpoints:

clearml-serving --id <service_id> model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --name "train sklearn model" --version 1 --project "serving examples"
clearml-serving --id <service_id> model add --engine sklearn --endpoint "test_model_sklearn" --preprocess "examples/sklearn/preprocess.py" --name "train sklearn model" --version 2 --project "serving examples"

Add Canary endpoint:

clearml-serving --id <service_id> model canary --endpoint "test_model_sklearn_canary" --weights 0.1 0.9 --input-endpoints test_model_sklearn/2 test_model_sklearn/1

Test Canary endpoint:

curl -X POST "http://127.0.0.1:8080/serve/test_model" -H "accept: application/json" -H "Content-Type: application/json" -d '{"x0": 1, "x1": 2}'

📊 Model monitoring and performance metrics 🔔

ClearML serving instances send serving statistics (count/latency) automatically to Prometheus and Grafana can be used to visualize and create live dashboards.

The default docker-compose installation is preconfigured with Prometheus and Grafana, do notice that by default data/ate of both containers is not persistent. To add persistence we do recommend adding a volume mount.

You can also add many custom metrics on the input/predictions of your models. Once a model endpoint is registered, adding custom metric can be done using the CLI. For example, assume we have our mock scikit-learn model deployed on endpoint test_model_sklearn, we can log the requests inputs and outputs (see examples/sklearn/preprocess.py example):

clearml-serving --id <serving_service_id_here> metrics add --endpoint test_model_sklearn --variable-scalar
x0=0,0.1,0.5,1,10 x1=0,0.1,0.5,1,10 y=0,0.1,0.5,0.75,1

This will create a distribution histogram (buckets specified via a list of less-equal values after = sign), that we will be able to visualize on Grafana. Notice we can also log time-series values with --variable-value x2 or discrete results (e.g. classifications strings) with --variable-enum animal=cat,dog,sheep. Additional custom variables can be in the preprocess and postprocess with a call to collect_custom_statistics_fn({'new_var': 1.337}) see clearml_serving/preprocess/preprocess_template.py

With the new metrics logged we can create a visualization dashboard over the latency of the calls, and the output distribution.

Grafana model performance example:

browse to http://localhost:3000
login with: admin/admin
create a new dashboard
select Prometheus as data source
Add a query: 100 * increase(test_model_sklearn:_latency_bucket[1m]) / increase(test_model_sklearn:_latency_sum[1m])
Change type to heatmap, and select on the right hand-side under "Data Format" select "Time series buckets"
You now have the latency distribution, over time.
Repeat the same process for x0, the query would be 100 * increase(test_model_sklearn:x0_bucket[1m]) / increase(test_model_sklearn:x0_sum[1m])

Notice: If not specified all serving requests will be logged, to change the default configure "CLEARML_DEFAULT_METRIC_LOG_FREQ", for example CLEARML_DEFAULT_METRIC_LOG_FREQ=0.2 means only 20% of all requests will be logged. You can also specify per endpoint log frequency with the clearml-serving CLI. Check the CLI documentation with clearml-serving metrics --help

🔥 Model Serving Examples

Scikit-Learn example - random data
Scikit-Learn Model Ensemble example - random data
XGBoost example - iris dataset
LightGBM example - iris dataset
PyTorch example - mnist dataset
TensorFlow/Keras example - mnist dataset
Multi-Model Pipeline example - multiple models
Multi-Model ASync Pipeline example - multiple models
Custom Model example - custom data

🙏 Status

[x] FastAPI integration for inference service
[x] multi-process Gunicorn for inference service
[x] Dynamic preprocess python code loading (no need for container/process restart)
[x] Model files download/caching (http/s3/gs/azure)
[x] Scikit-learn. XGBoost, LightGBM integration
[x] Custom inference, including dynamic code loading
[x] Manual model upload/registration to model repository (http/s3/gs/azure)
[x] Canary load balancing
[x] Auto model endpoint deployment based on model repository state
[x] Machine/Node health metrics
[x] Dynamic online configuration
[x] CLI configuration tool
[x] Nvidia Triton integration
[x] GZip request compression
[x] TorchServe engine integration
[x] Prebuilt Docker containers (dockerhub)
[x] Docker-compose deployment (CPU/GPU)
[x] Scikit-Learn example
[x] XGBoost example
[x] LightGBM example
[x] PyTorch example
[x] TensorFlow/Keras example
[x] Model ensemble example
[x] Model pipeline example
[x] Statistics Service
[x] Kafka install instructions
[x] Prometheus install instructions
[x] Grafana install instructions
[x] Kubernetes Helm Chart
[ ] Intel optimized container (python, numpy, daal, scikit-learn)

Contributing

PRs are always welcomed ❤️ See more details in the ClearML Guidelines for Contributing.

For Tasks:

Click tags to check more tools for each tasks

deploy models monitor model performance scale model serving automate model upgrades create canary deployments

For Jobs:

machine learning engineer data scientist devops engineer ai engineer software engineer

Alternative AI tools for clearml-serving

Similar Open Source Tools

clearml-serving

github

: 132

VLM-R1

VLM-R1 is a stable and generalizable R1-style Large Vision-Language Model proposed for Referring Expression Comprehension (REC) task. It compares R1 and SFT approaches, showing R1 model's steady improvement on out-of-domain test data. The project includes setup instructions, training steps for GRPO and SFT models, support for user data loading, and evaluation process. Acknowledgements to various open-source projects and resources are mentioned. The project aims to provide a reliable and versatile solution for vision-language tasks.

github

: 4.4k

py-llm-core

PyLLMCore is a light-weighted interface with Large Language Models with native support for llama.cpp, OpenAI API, and Azure deployments. It offers a Pythonic API that is simple to use, with structures provided by the standard library dataclasses module. The high-level API includes the assistants module for easy swapping between models. PyLLMCore supports various models including those compatible with llama.cpp, OpenAI, and Azure APIs. It covers use cases such as parsing, summarizing, question answering, hallucinations reduction, context size management, and tokenizing. The tool allows users to interact with language models for tasks like parsing text, summarizing content, answering questions, reducing hallucinations, managing context size, and tokenizing text.

github

: 118

catai

CatAI is a tool that allows users to run GGUF models on their computer with a chat UI. It serves as a local AI assistant inspired by Node-Llama-Cpp and Llama.cpp. The tool provides features such as auto-detecting programming language, showing original messages by clicking on user icons, real-time text streaming, and fast model downloads. Users can interact with the tool through a CLI that supports commands for installing, listing, setting, serving, updating, and removing models. CatAI is cross-platform and supports Windows, Linux, and Mac. It utilizes node-llama-cpp and offers a simple API for asking model questions. Additionally, developers can integrate the tool with node-llama-cpp@beta for model management and chatting. The configuration can be edited via the web UI, and contributions to the project are welcome. The tool is licensed under Llama.cpp's license.

github

: 410

scalene

Scalene is a high-performance CPU, GPU, and memory profiler for Python that provides detailed information and runs faster than many other profilers. It incorporates AI-powered proposed optimizations, allowing users to generate optimization suggestions by clicking on specific lines or regions of code. Scalene separates time spent in Python from native code, highlights hotspots, and identifies memory usage per line. It supports GPU profiling on NVIDIA-based systems and detects memory leaks. Users can generate reduced profiles, profile specific functions using decorators, and suspend/resume profiling for background processes. Scalene is available as a pip or conda package and works on various platforms. It offers features like profiling at the line level, memory trends, copy volume reporting, and leak detection.

github

: 12.5k

claude-code.nvim

Claude Code Neovim Plugin is a seamless integration between Claude Code AI assistant and Neovim. It allows users to toggle Claude Code in a terminal window with a single key press, automatically detect and reload files modified by Claude Code, provide real-time buffer updates when files are changed externally, offer customizable window position and size, integrate with which-key, use git project root as working directory, maintain a modular code structure, provide type annotations with LuaCATS for better IDE support, offer configuration validation, and include a testing framework for reliability. The plugin creates a terminal buffer running the Claude Code CLI, sets up autocommands to detect file changes on disk, automatically reloads files modified by Claude Code, provides keymaps and commands for toggling the terminal, and detects git repositories to set the working directory to the git root.

github

: 70

tgpt

tgpt is a cross-platform command-line interface (CLI) tool that allows users to interact with AI chatbots in the Terminal without needing API keys. It supports various AI providers such as KoboldAI, Phind, Llama2, Blackbox AI, and OpenAI. Users can generate text, code, and images using different flags and options. The tool can be installed on GNU/Linux, MacOS, FreeBSD, and Windows systems. It also supports proxy configurations and provides options for updating and uninstalling the tool.

github

: 2.4k

Curie

github

: 52

booster

Booster is a powerful inference accelerator designed for scaling large language models within production environments or for experimental purposes. It is built with performance and scaling in mind, supporting various CPUs and GPUs, including Nvidia CUDA, Apple Metal, and OpenCL cards. The tool can split large models across multiple GPUs, offering fast inference on machines with beefy GPUs. It supports both regular FP16/FP32 models and quantised versions, along with popular LLM architectures. Additionally, Booster features proprietary Janus Sampling for code generation and non-English languages.

github

: 133

starcoder2-self-align

StarCoder2-Instruct is an open-source pipeline that introduces StarCoder2-15B-Instruct-v0.1, a self-aligned code Large Language Model (LLM) trained with a fully permissive and transparent pipeline. It generates instruction-response pairs to fine-tune StarCoder-15B without human annotations or data from proprietary LLMs. The tool is primarily finetuned for Python code generation tasks that can be verified through execution, with potential biases and limitations. Users can provide response prefixes or one-shot examples to guide the model's output. The model may have limitations with other programming languages and out-of-domain coding tasks.

github

: 170

gpt-computer-assistant

GPT Computer Assistant (GCA) is an open-source framework designed to build vertical AI agents that can automate tasks on Windows, macOS, and Ubuntu systems. It leverages the Model Context Protocol (MCP) and its own modules to mimic human-like actions and achieve advanced capabilities. With GCA, users can empower themselves to accomplish more in less time by automating tasks like updating dependencies, analyzing databases, and configuring cloud security settings.

github

: 5.8k

effective_llm_alignment

This is a super customizable, concise, user-friendly, and efficient toolkit for training and aligning LLMs. It provides support for various methods such as SFT, Distillation, DPO, ORPO, CPO, SimPO, SMPO, Non-pair Reward Modeling, Special prompts basket format, Rejection Sampling, Scoring using RM, Effective FAISS Map-Reduce Deduplication, LLM scoring using RM, NER, CLIP, Classification, and STS. The toolkit offers key libraries like PyTorch, Transformers, TRL, Accelerate, FSDP, DeepSpeed, and tools for result logging with wandb or clearml. It allows mixing datasets, generation and logging in wandb/clearml, vLLM batched generation, and aligns models using the SMPO method.

github

: 105

node-llama-cpp

node-llama-cpp is a tool that allows users to run AI models locally on their machines. It provides pre-built bindings with the option to build from source using cmake. Users can interact with text generation models, chat with models using a chat wrapper, and force models to generate output in a parseable format like JSON. The tool supports Metal and CUDA, offers CLI functionality for chatting with models without coding, and ensures up-to-date compatibility with the latest version of llama.cpp. Installation includes pre-built binaries for macOS, Linux, and Windows, with the option to build from source if binaries are not available for the platform.

github

: 853

agents-starter

A starter template for building AI-powered chat agents using Cloudflare's Agent platform, powered by agents-sdk. It provides a foundation for creating interactive chat experiences with AI, complete with a modern UI and tool integration capabilities. Features include interactive chat interface with AI, built-in tool system with human-in-the-loop confirmation, advanced task scheduling, dark/light theme support, real-time streaming responses, state management, and chat history. Prerequisites include a Cloudflare account and OpenAI API key. The project structure includes components for chat UI implementation, chat agent logic, tool definitions, and helper functions. Customization guide covers adding new tools, modifying the UI, and example use cases for customer support, development assistant, data analysis assistant, personal productivity assistant, and scheduling assistant.

github

: 503

AGiXT

AGiXT is a dynamic Artificial Intelligence Automation Platform engineered to orchestrate efficient AI instruction management and task execution across a multitude of providers. Our solution infuses adaptive memory handling with a broad spectrum of commands to enhance AI's understanding and responsiveness, leading to improved task completion. The platform's smart features, like Smart Instruct and Smart Chat, seamlessly integrate web search, planning strategies, and conversation continuity, transforming the interaction between users and AI. By leveraging a powerful plugin system that includes web browsing and command execution, AGiXT stands as a versatile bridge between AI models and users. With an expanding roster of AI providers, code evaluation capabilities, comprehensive chain management, and platform interoperability, AGiXT is consistently evolving to drive a multitude of applications, affirming its place at the forefront of AI technology.

github

: 3.0k

RooFlow

RooFlow is a VS Code extension that enhances AI-assisted development by providing persistent project context and optimized mode interactions. It reduces token consumption and streamlines workflow by integrating Architect, Code, Test, Debug, and Ask modes. The tool simplifies setup, offers real-time updates, and provides clearer instructions through YAML-based rule files. It includes components like Memory Bank, System Prompts, VS Code Integration, and Real-time Updates. Users can install RooFlow by downloading specific files, placing them in the project structure, and running an insert-variables script. They can then start a chat, select a mode, interact with Roo, and use the 'Update Memory Bank' command for synchronization. The Memory Bank structure includes files for active context, decision log, product context, progress tracking, and system patterns. RooFlow features persistent context, real-time updates, mode collaboration, and reduced token consumption.

github

: 226

For similar tasks

clearml-serving

github

: 132

ai-on-gke

This repository contains assets related to AI/ML workloads on Google Kubernetes Engine (GKE). Run optimized AI/ML workloads with Google Kubernetes Engine (GKE) platform orchestration capabilities. A robust AI/ML platform considers the following layers: Infrastructure orchestration that support GPUs and TPUs for training and serving workloads at scale Flexible integration with distributed computing and data processing frameworks Support for multiple teams on the same infrastructure to maximize utilization of resources

github

: 280

ray

Ray is a unified framework for scaling AI and Python applications. It consists of a core distributed runtime and a set of AI libraries for simplifying ML compute, including Data, Train, Tune, RLlib, and Serve. Ray runs on any machine, cluster, cloud provider, and Kubernetes, and features a growing ecosystem of community integrations. With Ray, you can seamlessly scale the same code from a laptop to a cluster, making it easy to meet the compute-intensive demands of modern ML workloads.

github

: 36.4k

labelbox-python

Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.

github

: 135

djl

Deep Java Library (DJL) is an open-source, high-level, engine-agnostic Java framework for deep learning. It is designed to be easy to get started with and simple to use for Java developers. DJL provides a native Java development experience and allows users to integrate machine learning and deep learning models with their Java applications. The framework is deep learning engine agnostic, enabling users to switch engines at any point for optimal performance. DJL's ergonomic API interface guides users with best practices to accomplish deep learning tasks, such as running inference and training neural networks.

github

: 4.1k

mlflow

MLflow is a platform to streamline machine learning development, including tracking experiments, packaging code into reproducible runs, and sharing and deploying models. MLflow offers a set of lightweight APIs that can be used with any existing machine learning application or library (TensorFlow, PyTorch, XGBoost, etc), wherever you currently run ML code (e.g. in notebooks, standalone applications or the cloud). MLflow's current components are: * `MLflow Tracking `_: An API to log parameters, code, and results in machine learning experiments and compare them using an interactive UI. * `MLflow Projects `_: A code packaging format for reproducible runs using Conda and Docker, so you can share your ML code with others. * `MLflow Models `_: A model packaging format and tools that let you easily deploy the same model (from any ML library) to batch and real-time scoring on platforms such as Docker, Apache Spark, Azure ML and AWS SageMaker. * `MLflow Model Registry `_: A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of MLflow Models.

github

: 19.9k

tt-metal

TT-NN is a python & C++ Neural Network OP library. It provides a low-level programming model, TT-Metalium, enabling kernel development for Tenstorrent hardware.

github

: 786

burn

Burn is a new comprehensive dynamic Deep Learning Framework built using Rust with extreme flexibility, compute efficiency and portability as its primary goals.

github

: 10.2k

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k

clearml-serving

README:

clearml-serving v1.3.1 ✨ Model Serving (ML/DL) Made Easy 🎉 🔥 NEW version 1.3 🚀 20% faster !

🔥 NEW 🎊 Take it for a spin with a simple docker-compose command 🪄 ✨

ClearML Serving Design

ClearML Serving Design Principles

Installation

Prerequisites

💅 Initial Setup

🌊 Optional: advanced setup - S3/GS/Azure access

💁 Concepts

👉 Toy model (scikit learn) deployment example

🐢 Registering & Deploying new models manually

🐰 Automatic model deployment

💡 Automatic model deployment example

🐦 Canary endpoint setup

📊 Model monitoring and performance metrics 🔔

🔥 Model Serving Examples

🙏 Status

Contributing

For Tasks:

For Jobs:

Alternative AI tools for clearml-serving

Similar Open Source Tools

clearml-serving

VLM-R1

py-llm-core

catai

scalene

claude-code.nvim

tgpt

Curie

booster

starcoder2-self-align

gpt-computer-assistant

effective_llm_alignment

node-llama-cpp

agents-starter

AGiXT

RooFlow

For similar tasks

clearml-serving

ai-on-gke

ray

labelbox-python

djl

mlflow

tt-metal

burn

For similar jobs

sweep

teams-ai

ai-guide

classifai

chatbot-ui

BricksLLM

uAgents

griptape

`clearml-serving v1.3.1` ✨ Model Serving (ML/DL) Made Easy 🎉
🔥 NEW version 1.3 🚀 20% faster !

🔥 NEW 🎊 Take it for a spin with a simple `docker-compose` command 🪄 ✨