$clearml-fractional-gpu$

clearml-fractional-gpu

ClearML Fractional GPU - Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing

Stars: 56

Visit

ClearML Fractional GPU is a tool designed to optimize GPU resource utilization by allowing multiple containers to run on the same GPU with driver-level memory limitation and compute time-slicing. It supports CUDA 11.x & CUDA 12.x, preventing greedy processes from grabbing the entire GPU memory. The tool offers options like Dynamic GPU Slicing, Container-based Memory Limits, and Kubernetes-based Static MIG Slicing to enhance hardware utilization and workload performance for AI development.

README:

🚀 🔥 Fractional GPU! ⚡ 📣

Run multiple containers on the same GPU with driver level memory limitation ✨ and compute time-slicing 🎊

🌟 Leave a star to support the project! 🌟

🔰 Introduction

Sharing high-end GPUs or even prosumer & consumer GPUs between multiple users is the most cost-effective way to accelerate AI development. Unfortunately, until now the only existing solution applied for MIG/Slicing high-end GPUs (A100+) and required Kubernetes,

🔥 🎉 Welcome To Container Based Fractional GPU For Any Nvidia Card! 🎉 🔥

We present pre-packaged containers supporting CUDA 11.x & CUDA 12.x with pre-built hard memory limitation! This means multiple containers can be launched on the same GPU, ensuring one user cannot allocate the entire host GPU memory! (No more greedy processes grabbing the entire GPU memory! Finally we have a driver level hard limiting memory option).

🚀 Offerings

ClearML offers several options to optimize GPU resource utilization by partitioning GPUs:

Dynamic GPU Slicing: On-demand GPU slicing per task for both MIG and non-MIG devices (available under the ClearML Enterprise plan):
- Bare Metal deployment
- Kubernetes deployment
Container-based Memory Limits (this repository): Use pre-packaged containers with built-in memory limits to run multiple containers on the same GPU (available as part of the ClearML open source offering).
Kubernetes-based Static MIG Slicing: Set up Kubernetes support for NVIDIA MIG (Multi-Instance GPU) to define GPU fractions for specific workloads (available as part of the ClearML open source offering).

With these options, ClearML enables running AI workloads with optimized hardware utilization and workload performance. This repository covers container-based fractional GPUs. For more information on ClearML's fractional GPU offerings, see the ClearML documentation.

⚡ Installation

Pick the container that works for you and launch it:

docker run -it --gpus 0 --ipc=host --pid=host clearml/fractional-gpu:u22-cu12.3-8gb bash

To verify fraction GPU memory limit is working correctly, run inside the container:

nvidia-smi

Here is an example output from A100 GPU:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.08              Driver Version: 545.23.08    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  A100-PCIE-40GB                Off  | 00000000:01:00.0 Off |                  N/A |
| 32%   33C    P0              66W / 250W |      0MiB /  8128MiB |      3%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

🐳 Containers

Memory Limit	CUDA Ver	Ubuntu Ver	Docker Image
12 GiB	12.3	22.04	`clearml/fractional-gpu:u22-cu12.3-12gb`
12 GiB	12.3	20.04	`clearml/fractional-gpu:u20-cu12.3-12gb`
12 GiB	11.7	22.04	`clearml/fractional-gpu:u22-cu11.7-12gb`
12 GiB	11.1	20.04	`clearml/fractional-gpu:u20-cu11.1-12gb`
8 GiB	12.3	22.04	`clearml/fractional-gpu:u22-cu12.3-8gb`
8 GiB	12.3	20.04	`clearml/fractional-gpu:u20-cu12.3-8gb`
8 GiB	11.7	22.04	`clearml/fractional-gpu:u22-cu11.7-8gb`
8 GiB	11.1	20.04	`clearml/fractional-gpu:u20-cu11.1-8gb`
4 GiB	12.3	22.04	`clearml/fractional-gpu:u22-cu12.3-4gb`
4 GiB	12.3	20.04	`clearml/fractional-gpu:u20-cu12.3-4gb`
4 GiB	11.7	22.04	`clearml/fractional-gpu:u22-cu11.7-4gb`
4 GiB	11.1	20.04	`clearml/fractional-gpu:u20-cu11.1-4gb`
2 GiB	12.3	22.04	`clearml/fractional-gpu:u22-cu12.3-2gb`
2 GiB	12.3	20.04	`clearml/fractional-gpu:u20-cu12.3-2gb`
2 GiB	11.7	22.04	`clearml/fractional-gpu:u22-cu11.7-2gb`
2 GiB	11.1	20.04	`clearml/fractional-gpu:u20-cu11.1-2gb`

[!IMPORTANT]

You must execute the container with --pid=host !

[!NOTE]

--pid=host is required to allow the driver to differentiate between the container's processes and other host processes when limiting memory / utilization usage

[!TIP]

ClearML-Agent users add [--pid=host] to your agent.extra_docker_arguments section in your config file

🔩 Customization

Build your own containers and inherit form the original containers.

You can find a few examples here.

☸ Kubernetes

Fractional GPU containers can be used on bare-metal executions as well as Kubernetes PODs. Yes! By using one of the Fractional GPU containers you can limit the memory consumption of your Job/Pod and easily share GPUs without fearing they will memory crash one another!

Here's a simple Kubernetes POD template:

apiVersion: v1
kind: Pod
metadata:
  name: train-pod
  labels:
    app: trainme
spec:
  hostPID: true
  containers:
  - name: train-container
    image: clearml/fractional-gpu:u22-cu12.3-8gb
    command: ['python3', '-c', 'print(f"Free GPU Memory: (free, global) {torch.cuda.mem_get_info()}")']

[!IMPORTANT]

You must execute the pod with hostPID: true !

[!NOTE]

hostPID: true is required to allow the driver to differentiate between the pod's processes and other host processes when limiting memory / utilization usage

🔌 Support & Limitations

The containers support Nvidia drivers <= 545.x.x. We will keep updating & supporting new drivers as they continue to be released

Supported GPUs: RTX series 10, 20, 30, 40, A series, and Data-Center P100, A100, A10/A40, L40/s, H100

Limitations: Windows Host machines are currently not supported. If this is important for you, leave a request in the Issues section

❓ FAQ

Q: Will running nvidia-smi inside the container report the local processes GPU consumption?
A: Yes, nvidia-smi is communicating directly with the low-level drivers and reports both accurate container GPU memory as well as the container local memory limitation.
Notice GPU utilization will be the global (i.e. host side) GPU utilization and not the specific local container GPU utilization.
Q: How do I make sure my Python / Pytorch / Tensorflow are actually memory limited?
A: For PyTorch you can run:

import torch
print(f'Free GPU Memory: (free, global) {torch.cuda.mem_get_info()}')

Numba example:

from numba import cuda
print(f'Free GPU Memory: {cuda.current_context().get_memory_info()}')

Q: Can the limitation be broken by a user?
A: We are sure a malicious user will find a way. It was never our intention to protect against malicious users.
If you have a malicious user with access to your machines, fractional GPUs are not your number 1 problem 😃
Q: How can I programmatically detect the memory limitation?
A: You can check the OS environment variable GPU_MEM_LIMIT_GB.
Notice that changing it will not remove or reduce the limitation.
Q: Is running the container with --pid=host secure / safe?
A: It should be both secure and safe. The main caveat from a security perspective is that a container process can see any command line running on the host system. If a process command line contains a "secret" then yes, this might become a potential data leak. Notice that passing "secrets" in the command line is ill-advised, and hence we do not consider it a security risk. That said if security is key, the enterprise edition (see below) eliminate the need to run with pid-host and thus fully secure.
Q: Can you run the container without --pid=host ?
A: You can! But you will have to use the enterprise version of the clearml-fractional-gpu container (otherwise the memory limit is applied system wide instead of container wide). If this feature is important for you, please contact ClearML sales & support.

📄 License

The license to use ClearML is granted for research or development purposes only. ClearML may be used for educational, personal, or internal commercial use.

An expanded Commercial license for use within a product or service is available as part of the ClearML Scale or Enterprise solution.

🤖 Commercial & Enterprise version

ClearML offers enterprise and commercial license adding many additional features on top of fractional GPUs, these include orchestration, priority queues, quota management, compute cluster dashboard, dataset management & experiment management, as well as enterprise grade security and support. Learn more about ClearML Orchestration or talk to us directly at ClearML sales.

📡 How can I help?

Tell everyone about it! #ClearMLFractionalGPU

Join our Slack Channel

Tell us when things are not working, and help us debug it on the Issues Page

🌟 Credits

This product is brought to you by the ClearML team with ❤️

For Tasks:

Click tags to check more tools for each tasks

run multiple containers optimize gpu usage limit memory consumption share gpus partition gpus

For Jobs:

ai researcher data scientist machine learning engineer gpu cloud administrator devops engineer

Alternative AI tools for clearml-fractional-gpu

Similar Open Source Tools

$clearml-fractional-gpu Screenshot$

clearml-fractional-gpu

github

: 56

AIOS

AIOS, a Large Language Model (LLM) Agent operating system, embeds large language model into Operating Systems (OS) as the brain of the OS, enabling an operating system "with soul" -- an important step towards AGI. AIOS is designed to optimize resource allocation, facilitate context switch across agents, enable concurrent execution of agents, provide tool service for agents, maintain access control for agents, and provide a rich set of toolkits for LLM Agent developers.

github

: 4.0k

TPI-LLM

TPI-LLM (Tensor Parallelism Inference for Large Language Models) is a system designed to bring LLM functions to low-resource edge devices, addressing privacy concerns by enabling LLM inference on edge devices with limited resources. It leverages multiple edge devices for inference through tensor parallelism and a sliding window memory scheduler to minimize memory usage. TPI-LLM demonstrates significant improvements in TTFT and token latency compared to other models, and plans to support infinitely large models with low token latency in the future.

github

: 123

Starmoon

Starmoon is an affordable, compact AI-enabled device that can understand and respond to your emotions with empathy. It offers supportive conversations and personalized learning assistance. The device is cost-effective, voice-enabled, open-source, compact, and aims to reduce screen time. Users can assemble the device themselves using off-the-shelf components and deploy it locally for data privacy. Starmoon integrates various APIs for AI language models, speech-to-text, text-to-speech, and emotion intelligence. The hardware setup involves components like ESP32S3, microphone, amplifier, speaker, LED light, and button, along with software setup instructions for developers. The project also includes a web app, backend API, and background task dashboard for monitoring and management.

github

: 457

openlit

OpenLIT is an OpenTelemetry-native GenAI and LLM Application Observability tool. It's designed to make the integration process of observability into GenAI projects as easy as pie – literally, with just **a single line of code**. Whether you're working with popular LLM Libraries such as OpenAI and HuggingFace or leveraging vector databases like ChromaDB, OpenLIT ensures your applications are monitored seamlessly, providing critical insights to improve performance and reliability.

github

: 1.3k

cambrian

Cambrian-1 is a fully open project focused on exploring multimodal Large Language Models (LLMs) with a vision-centric approach. It offers competitive performance across various benchmarks with models at different parameter levels. The project includes training configurations, model weights, instruction tuning data, and evaluation details. Users can interact with Cambrian-1 through a Gradio web interface for inference. The project is inspired by LLaVA and incorporates contributions from Vicuna, LLaMA, and Yi. Cambrian-1 is licensed under Apache 2.0 and utilizes datasets and checkpoints subject to their respective original licenses.

github

: 1.4k

ReasonFlux

ReasonFlux is a revolutionary template-augmented reasoning paradigm that empowers a 32B model to outperform other models in reasoning tasks. The repository provides official resources for the paper 'ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates', including the latest released model ReasonFlux-F1-32B. It includes updates, dataset links, model zoo, getting started guide, training instructions, evaluation details, inference examples, performance comparisons, reasoning examples, preliminary work references, and citation information.

github

: 367

SimpleAICV_pytorch_training_examples

SimpleAICV_pytorch_training_examples is a repository that provides simple training and testing examples for various computer vision tasks such as image classification, object detection, semantic segmentation, instance segmentation, knowledge distillation, contrastive learning, masked image modeling, OCR text detection, OCR text recognition, human matting, salient object detection, interactive segmentation, image inpainting, and diffusion model tasks. The repository includes support for multiple datasets and networks, along with instructions on how to prepare datasets, train and test models, and use gradio demos. It also offers pretrained models and experiment records for download from huggingface or Baidu-Netdisk. The repository requires specific environments and package installations to run effectively.

github

: 429

StableToolBench

StableToolBench is a new benchmark developed to address the instability of Tool Learning benchmarks. It aims to balance stability and reality by introducing features such as a Virtual API System with caching and API simulators, a new set of solvable queries determined by LLMs, and a Stable Evaluation System using GPT-4. The Virtual API Server can be set up either by building from source or using a prebuilt Docker image. Users can test the server using provided scripts and evaluate models with Solvable Pass Rate and Solvable Win Rate metrics. The tool also includes model experiments results comparing different models' performance.

github

: 59

eko

Eko is a lightweight and flexible command-line tool for managing environment variables in your projects. It allows you to easily set, get, and delete environment variables for different environments, making it simple to manage configurations across development, staging, and production environments. With Eko, you can streamline your workflow and ensure consistency in your application settings without the need for complex setup or configuration files.

github

: 1.1k

AQLM

AQLM is the official PyTorch implementation for Extreme Compression of Large Language Models via Additive Quantization. It includes prequantized AQLM models without PV-Tuning and PV-Tuned models for LLaMA, Mistral, and Mixtral families. The repository provides inference examples, model details, and quantization setups. Users can run prequantized models using Google Colab examples, work with different model families, and install the necessary inference library. The repository also offers detailed instructions for quantization, fine-tuning, and model evaluation. AQLM quantization involves calibrating models for compression, and users can improve model accuracy through finetuning. Additionally, the repository includes information on preparing models for inference and contributing guidelines.

github

: 1.2k

airunner

AI Runner is a multi-modal AI interface that allows users to run open-source large language models and AI image generators on their own hardware. The tool provides features such as voice-based chatbot conversations, text-to-speech, speech-to-text, vision-to-text, text generation with large language models, image generation capabilities, image manipulation tools, utility functions, and more. It aims to provide a stable and user-friendly experience with security updates, a new UI, and a streamlined installation process. The application is designed to run offline on users' hardware without relying on a web server, offering a smooth and responsive user experience.

github

: 307

HuatuoGPT-o1

HuatuoGPT-o1 is a medical language model designed for advanced medical reasoning. It can identify mistakes, explore alternative strategies, and refine answers. The model leverages verifiable medical problems and a specialized medical verifier to guide complex reasoning trajectories and enhance reasoning through reinforcement learning. The repository provides access to models, data, and code for HuatuoGPT-o1, allowing users to deploy the model for medical reasoning tasks.

github

: 480

star-vector

StarVector is a multimodal vision-language model for Scalable Vector Graphics (SVG) generation. It can be used to perform image2SVG and text2SVG generation. StarVector works directly in the SVG code space, leveraging visual understanding to apply accurate SVG primitives. It achieves state-of-the-art performance in producing compact and semantically rich SVGs. The tool provides Hugging Face model checkpoints for image2SVG vectorization, with models like StarVector-8B and StarVector-1B. It also offers datasets like SVG-Stack, SVG-Fonts, SVG-Icons, SVG-Emoji, and SVG-Diagrams for evaluation. StarVector can be trained using Deepspeed or FSDP for tasks like Image2SVG and Text2SVG generation. The tool provides a demo with options for HuggingFace generation or VLLM backend for faster generation speed.

github

: 118

MooER

MooER (摩耳) is an LLM-based speech recognition and translation model developed by Moore Threads. It allows users to transcribe speech into text (ASR) and translate speech into other languages (AST) in an end-to-end manner. The model was trained using 5K hours of data and is now also available with an 80K hours version. MooER is the first LLM-based speech model trained and inferred using domestic GPUs. The repository includes pretrained models, inference code, and a Gradio demo for a better user experience.

github

: 124

thinc

Thinc is a lightweight deep learning library that offers an elegant, type-checked, functional-programming API for composing models, with support for layers defined in other frameworks such as PyTorch, TensorFlow and MXNet. You can use Thinc as an interface layer, a standalone toolkit or a flexible way to develop new models.

github

: 2.8k

For similar tasks

$clearml-fractional-gpu Screenshot$

clearml-fractional-gpu

github

: 56

For similar jobs

weave

Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.

github

: 855

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

VisionCraft

The VisionCraft API is a free API for using over 100 different AI models. From images to sound.

github

: 94

kaito

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.

github

: 405

PyRIT

PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.

github

: 2.3k

tabby

Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.

github

: 30.6k

spear

SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.

github

: 224

Magick

Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

github

: 675