SREGym

SREGym

An AI-Native Platform for Benchmarking SRE Agents

Stars: 117

Visit
 screenshot

SREGym is an AI-native platform for designing, developing, and evaluating AI agents for Site Reliability Engineering (SRE). It provides a comprehensive benchmark suite with 86 different SRE problems, including OS-level faults, metastable failures, and concurrent failures. Users can run agents in isolated Docker containers, specify models from multiple providers, and access detailed documentation on the platform's website.

README:

SREGym: A Benchmarking Platform for SRE Agents

🔍Overview | 📦Installation | 🚀Quick Start | ⚙️Usage | 🤝Contributing | 📖Docs | Slack

🔍 Overview

SREGym is an AI-native platform to enable the design, development, and evaluation of AI agents for Site Reliability Engineering (SRE). The core idea is to create live system environments for SRE agents to solve real-world SRE problems. SREGym provides a comprehensive SRE benchmark suite with a wide variety of problems for evaluating SRE agents and also for training next-generation AI agents.

SREGym Overview

SREGym is inspired by our prior work on AIOpsLab and ITBench. It is architectured with AI-native usability and extensibility as first-class principles. The SREGym benchmark suites contain 86 different SRE problems. It supports all the problems from AIOpsLab and ITBench, and includes new problems such as OS-level faults, metastable failures, and concurrent failures. See our problem set for a complete list of problems.

📦 Installation

Requirements

Recommendations

git clone --recurse-submodules https://github.com/SREGym/SREGym
cd SREGym
uv sync
uv run pre-commit install

🚀 Quickstart

Setup your cluster

Choose either a) or b) to set up your cluster and then proceed to the next steps.

a) Kubernetes Cluster (Recommended)

SREGym supports any kubernetes cluster that your kubectl context is set to, whether it's a cluster from a cloud provider or one you build yourself.

We have an Ansible playbook to setup clusters on providers like CloudLab and our own machines. Follow this README to set up your own cluster.

b) Emulated cluster

SREGym can be run on an emulated cluster using kind on your local machine. However, not all problems are supported.

Note: If you run into pod crashes or "too many open files" errors, see the kind README for required host kernel settings and troubleshooting.

# For x86 machines
kind create cluster --config kind/kind-config-x86.yaml

# For ARM machines
kind create cluster --config kind/kind-config-arm.yaml

⚙️ Usage

Running an Agent

Quick Start

To get started with the included Stratus agent:

  1. Create your .env file:
mv .env.example .env
  1. Open the .env file and configure your model and API key.

  2. Run the benchmark:

python main.py --agent <agent-name> --model <model-id>

For example, to run the Stratus agent:

python main.py --agent stratus --model gpt-4o

Container Isolation

Agents always run in isolated Docker containers, preventing access to SREGym internals like problem definitions and grading logic. The image is built automatically on first run.

Use --force-build to rebuild the container image after updating dependencies or agent code:

python main.py --agent codex --model gpt-4o --force-build

Model Selection

SREGym supports multiple LLM providers. Specify your model using the --model flag:

python main.py --agent <agent-name> --model <model-id>

Available Models

Model ID Provider Model Name Required Environment Variables
gpt-5 OpenAI GPT-5 OPENAI_API_KEY
gemini-2.5-pro Google Gemini 2.5 Pro GEMINI_API_KEY
claude-sonnet-4 Anthropic Claude Sonnet 4 ANTHROPIC_API_KEY
bedrock-claude-sonnet-4.5 AWS Bedrock Claude Sonnet 4.5 AWS_PROFILE, AWS_DEFAULT_REGION

Default: If no model is specified, gpt-4o is used by default.

Provider Examples

OpenAI:

# In .env file
OPENAI_API_KEY="sk-proj-..."

# Run with GPT-4o
python main.py --agent stratus --model gpt-4o

Anthropic:

# In .env file
ANTHROPIC_API_KEY="sk-ant-api03-..."

# Run with Claude Sonnet 4
python main.py --agent stratus --model claude-sonnet-4

AWS Bedrock:

# In .env file
AWS_PROFILE="bedrock"
AWS_DEFAULT_REGION=us-east-2

# Run with Claude Sonnet 4.5 on Bedrock
python main.py --agent stratus --model bedrock-claude-sonnet-4.5

Note: For AWS Bedrock, ensure your AWS credentials are configured via ~/.aws/credentials and your profile has permissions to access Bedrock.

Acknowledgements

This project is generously supported by a Slingshot grant from the Laude Institute.

https://github.com/user-attachments/assets/e7b2ee27-e7a9-436a-858d-ee58e8bbd61d

License

Licensed under the MIT license.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for SREGym

Similar Open Source Tools

For similar tasks

For similar jobs