EnvScaler

EnvScaler

The official implementation of "EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis".

Stars: 92

Visit
 screenshot

EnvScaler is an automated, scalable framework that creates tool-interactive environments for training LLM agents. It consists of SkelBuilder for environment description mining and quality inspection, ScenGenerator for synthesizing multiple environment scenarios, and modules for supervised fine-tuning and reinforcement learning. The tool provides data, models, and evaluation guides for users to build, generate scenarios, collect training data, train models, and evaluate performance. Users can interact with environments, build environments from scratch, and improve LLMs' task-solving abilities in complex environments.

README:

EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis

Arxiv     WeChat Blog     Hugging Face Paper   Hugging Face Models   Hugging Face Datasets   Python 3.10+  
中文 | English
If you like our project, please give us a star ⭐ on GitHub. We greatly appreciate your support.

🎬 Demo

Env-Agent-User Interaction

Env-Agent Interaction

Building Environment From Scratch

To locally run the demo that interacting with Envs:

cd interact_with_env
python app.py

To locally run the demo that builing Envs from scratch:

cd skel_builder
python env_build_demo.py

📦 Dataset & Models

We provide EnvScaler’s data and models (after SFT+RL) as follows:

Data Link
191 Env Metadata 🤗 HuggingFace
4.7K SFT Scenario 🤗 HuggingFace
2.5K RL Scenario 🤗 HuggingFace
9K SFT Trajectory 🤗 HuggingFace
Model Link
EnvScaler-Qwen3-1.7B 🤗 HuggingFace
EnvScaler-Qwen3-4B 🤗 HuggingFace
EnvScaler-Qwen3-8B 🤗 HuggingFace

📑 Contents

👀 Overview

EnvScaler is an automated, scalable framework that realizes executable, stateful, tool-interactive environments via programmatic synthesis, for training LLM agents.


Overview of EnvScaler.

SkelBuilder is the first stage of EnvScaler. It (1) mines potential Env descriptions from existing open-source textual tasks; (2) plans the corresponding state schema and business rules, and generates a fully-functional Python class whose methods expose tool interfaces; (3) performs a dual-agent loop for Env quality inspection (one agent invokes tools, the other checks code, return values, and state changes), guaranteeing quality and consistency.


Framework of SkelBuilder.

ScenGenerator is the second stage for synthesizing multiple Env scenarios. Given an Env skeleton, it first prompts LLMs to generate an initial state/database, then creates a challenging task that can be solved from that state. Finally, it decomposes the task into checklists, and converts each checkpoint into a Python Boolean function over the final state of the Env, providing rule-based, verifiable reward signals.


Framework of ScenGenerator.

📊 Results

With EnvScaler, we synthesized 191 environments and about 7K scenarios, and applied them to Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) for Qwen3 series models. Results on three benchmarks show that EnvScaler significantly improves LLMs' ability to solve tasks in complex environments involving multiturn, multi-tool interactions.


Statistics of 191 synthesized environments.


Performance comparison.

📁 Project Structure

EnvScaler/
├── skel_builder/              # Stage 1: Env Skeleton Construction
├── scen_generator/            # Stage 2: Scenario Generation
├── interact_with_env/         # Agent-Env Interaction
├── sft/                       # Supervised Fine-Tuning (SFT)
├── rl/                        # Reinforcement Learning (RL)
└── evaluation/                # Evaluation Guide

Module Description

💡 Tip: We provide detailed documentation under each module.

  1. skel_builder/ – Env skeleton construction framework that automatically generates executable environment classes from existing tasks.
  2. scen_generator/ – Scenario generation framework that produces state data, task scenarios, and checkpoint functions for an Env skeleton.
  3. interact_with_env/ – Agent-Env interaction module supporting (1) collecting training data by interacting with synthesized Envs and (2) benchmark evaluation.
  4. sft/ – Supervised fine-tuning implementation based on LlamaFactory.
  5. rl/ – Reinforcement learning implementation based on the ROLL framework.
  6. evaluation/ – Evaluation guide including BFCL, TauBench, and ACEBench.

🚀 Quick Start

1. Clone the repository

git clone https://github.com/RUC-NLPIR/EnvScaler 
cd EnvScaler

2. Install dependencies

pip install -r requirements.txt

💡 Note: Basic dependencies are included in requirements.txt. If you need SFT or RL training, please install extra dependencies following the corresponding sub-project documentation:

  • SFT training: refer to sft/README.md to install LlamaFactory
  • RL training: refer to rl/README.md to install the ROLL framework

3. Configure LLM service

Option 1: Use OpenAI API

Create a .env file in the project root and configure your OpenAI API key:

# .env
OPENAI_API_KEY=your-openai-api-key-here
OPENAI_BASE_URL=https://api.openai.com/v1 

Option 2: Use self-hosted model

You can deploy a local model with an OpenAI-compatible inference framework such as vLLM.

Deploy a model with vLLM:

vllm serve your-model-path \
    --host 0.0.0.0 \
    --port 8000 \
    --trust-remote-code

⚠️ Important: Ensure the deployed model service supports Function Calling (FC) interface, see vLLM OpenAI-Compatible Server docs for details.

4. Verify configuration

Run the demo to verify your setup:

# Environment interaction demo
cd interact_with_env
python app.py

# Environment interaction Debug
cd interact_with_env
python run_main_debug.py

# Environment building demo
cd skel_builder
python env_build_demo.py

5. Start using

Now you can use each module of EnvScaler independently:

📚 Citation

If you find our work helpful, please consider citing it. We greatly appreciate your support.

@article{song2026envscaler,
  title={EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis},
  author={Song, Xiaoshuai and Chang, Haofei and Dong, Guanting and Zhu, Yutao and Dou, Zhicheng and Wen, Ji-Rong},
  journal={arXiv preprint arXiv:2601.05808},
  year={2026}
}

📞 Contact

For any questions or feedback, please reach out to us at [email protected].

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for EnvScaler

Similar Open Source Tools

For similar tasks

For similar jobs