beta9

beta9

Run GPU Workloads Across Multiple Clouds

Stars: 426

Visit
 screenshot

Beta9 is an open-source platform for running scalable serverless GPU workloads across cloud providers. It allows users to scale out workloads to thousands of GPU or CPU containers, achieve ultrafast cold-start for custom ML models, automatically scale to zero to pay for only what is used, utilize flexible distributed storage, distribute workloads across multiple cloud providers, and easily deploy task queues and functions using simple Python abstractions. The platform is designed for launching remote serverless containers quickly, featuring a custom, lazy loading image format backed by S3/FUSE, a fast redis-based container scheduling engine, content-addressed storage for caching images and files, and a custom runc container runtime.

README:

Logo


Run GPU Workloads Across Multiple Clouds

Documentation Join Slack Twitter Tests Passing


English | 简体中文 | 繁體中文 | Türkçe | हिंदी | Português (Brasil) | Italiano | Español | 한국어 | 日本語


What is Beta9?

Beta9 is an open source container orchestrator, designed for running GPU workloads across different cloud environments in different regions.

  • Connect VMs to your cluster with a single cURL command
  • Read large files at the edge using distributed, cross-region storage
  • Manage your fleet of GPUs using a Tailscale-powered service mesh
  • Securely run workloads with end-to-end encryption through WireGuard
  • Run workloads using a friendly Python interface

How does it work?

Provision GPUs Anywhere

Connect any GPU to your cluster with one CLI command and a cURL.

$ beta9 machine create --pool lambda-a100-40

=> Created machine with ID: '9541cbd2'. Use the following command to setup the node:

#!/bin/bash
sudo curl -L -o agent https://release.beam.cloud/agent/agent && \
sudo chmod +x agent && \
sudo ./agent --token "AUTH_TOKEN" \
  --machine-id "9541cbd2" \
  --tailscale-url "" \
  --tailscale-auth "AUTH_TOKEN" \
  --pool-name "lambda-a100-40" \
  --provider-name "lambda"

You can run this install script on your VM to connect it to your cluster.

Manage Your GPU Fleet

Manage your distributed cross-region GPU cluster using a centralized control plane.

$ beta9 machine list

| ID       | CPU     | Memory     | GPU     | Status     | Pool        |
|----------|---------|------------|---------|------------|-------------|
| edc9c2d2 | 30,000m | 222.16 GiB | A10G    | registered | lambda-a10g |
| d87ad026 | 30,000m | 216.25 GiB | A100-40 | registered | gcp-a100-40 |

Run Workloads in Python

Offload any workload to your remote machines by adding a Python decorator to your code.

from beta9 import function


# This will run on a remote A100-40 in your cluster
@function(cpu=1, memory=128, gpu="A100-40")
def square(i: int):
    return i**2

Local Installation

You can run Beta9 locally, or in an existing Kubernetes cluster using our Helm chart.

Setting Up the Server

k3d is used for local development. You'll need Docker to get started.

To use our fully automated setup, run the setup make target.

[!NOTE] This will overwrite some of the tools you may already have installed. Review the setup.sh to learn more.

make setup

Setting Up the SDK

The SDK is written in Python. You'll need Python 3.8 or higher. Use the setup-sdk make target to get started.

[!NOTE] This will install the Poetry package manager.

make setup-sdk

Using the SDK

After you've setup the server and SDK, check out the SDK readme here.

Contributing

We welcome contributions big or small. These are the most helpful things for us:

Community & Support

If you need support, you can reach out through any of these channels:

  • Slack (Chat live with maintainers and community members)
  • GitHub issues (Bug reports, feature requests, and anything roadmap related)
  • Twitter (Updates on releases and stuff)

Thanks to Our Contributors

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for beta9

Similar Open Source Tools

For similar tasks

For similar jobs