aira-dojo

aira-dojo

AIRA-dojo: a framework for developing and evaluating AI research agents

Stars: 94

Visit
 screenshot

aira-dojo is a scalable and customizable framework for AI research agents, designed to accelerate hill-climbing on research capabilities toward a fully automated AI research scientist. The framework provides a general abstraction for tasks and agents, implements the MLE-bench task, and includes state-of-the-art agents. It features an isolated code execution environment that integrates smoothly with job schedulers like Slurm, enabling large-scale experiments and rapid iteration across a portfolio of tasks and solvers.

README:

aira-dojo: AI Research Agent DOJO

Documentation

aira-dojo is a scalable and customizable framework for AI research agents, designed to accelerate hill-climbing on research capabilities toward a fully automated AI research scientist. The framework provides a general abstraction for tasks and agents, implements the MLE-bench task, and includes the state-of-the-art agents introduced in our paper, “AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench.” Additionally, it features an isolated code execution environment that integrates smoothly with job schedulers like Slurm. The framework enabled 1,000 agents to run in parallel for up to 120 hours, uncovering valuable insights and results detailed in the paper.

📚 Documentation

The following documentation is available to help you get started with aira-dojo:

Terminology

Task: A specific problem or challenge that the AI agent (solver) is designed to solve. Each task has a defined execution environment, solver action space, and evaluation function.

Solver: An AI agent that attempts to solve a given task. A solver is composed of:

  • Operators: Functions that are used to generate new solutions (e.g., a call to an LLM with a specific prompt and some context).
  • Search Policy: The method used to explore the solution space and orchestrate the execution of operators (e.g., greedy search, evolutionary search, Monte Carlo Tree Search)

Run: A single execution in which a solver (an AI agent) attempts to solve a given task.

Runner: A component used to parallelize runs. It manages and orchestrates multiple solver-task pairs concurrently, allowing large-scale experiments and rapid iteration across a portfolio of tasks and solvers.

The diagram below gives a high-level overview of the key components of the framework and how they interact.


image

Quick Start

1. Clone the Repository

git clone https://github.com/facebookresearch/aira-dojo
cd aira-dojo

2. Create the conda environment

conda env create -f environment.yaml
conda activate aira-dojo

3. Install aira-dojo via pip

pip install -e .

4. Set up Environment Variables

cp .env_default .env
# Edit .env with your specific configuration

Note that the .env file is ignored by git to avoid accidentally pushing tokens to github.

5. Change LLM Client Configs

If you are using different endpoints, you should change them accordingly in dojo/configs/run/solver/client Examples:

  • Changing Azure endpoint for 4o:

    Go to src/dojo/configs/run/solver/client/litellm_4o.yaml and change the base_url to your Azure endpoint:

      ...
      base_url: https://azure-services-endpoint-here.azure-api.net #<---- Set to your Azure endpoint
      ...
  • Changing to openai endpoint for 4o:

    Go to src/dojo/configs/run/solver/client/litellm_4o.yaml and change the base_url and use_azure_client to the following:

      ...
      base_url: null  # litellm will use the openai endpoint by default
      use_azure_client: False
      ...

    Finally, in .env, set your primary key to your openai key:

    PRIMARY_KEY="sk-..." # <---- Set to your OpenAI key>

Note: To run the examples in the "Example Usage" section of this read me, you must setup the following models:

6. Build a superimage with apptainer

Follow the steps in docs/BUILD_SUPERIMAGE.md to build your superimage. This is necessary to run tasks that use jupyter as the interpreter.

7. Install mle-bench and run you first task

Follow the steps in src/dojo/tasks/mlebench/README.md to install mle-bench and run your first task.

8. Setting up wandb

Log in with the following command:

  wandb login

It will ask you your API key, which you can get by going into "User settings" (click top right of screen) and scrolling down.

Example Usage

Single-Run Example

# Runs AIRA_GREEDY on a single MLE-bench task
python -m dojo.main_run +_exp=run_example logger.use_wandb=False

See the config run_example.yaml for details.

Parallel-Run (Runner) Example

# Runs AIRA_GREEDY on our quick-dev set of MLE-bench tasks
python -m dojo.main_runner_job_array +_exp=runner_example logger.use_wandb=False launcher.debug=True

See the config runner_example.yaml for details.

Hydra Multi Parallel-Run Example

# Runs AIRA_GREEDY on our quick-dev set of MLE-bench tasks
python -m dojo.main_runner_job_array +_exp=runner_multi_example logger.use_wandb=False launcher.debug=True

See the config runner_multi_example.yaml for details.

Running AIRAGREEDY , AIDEGREEDY , AIRAMCTS and AIRA_EVO on MLEbench lite

Note: Make you set <<<DEFAULT_SLURM_ACCOUNT>>>, <<<DEFAULT_SLURM_QOS>>>, and <<<DEFAULT_SLURM_PARTITION>>> with your actual Slurm account, QoS, and partition settings in your .env before running these commands

# Runs AIRA_GREEDY on MLE-bench lite tasks
python -m dojo.main_runner_job_array +_exp=mlebench/aide_greedy_o3 logger.use_wandb=False launcher.debug=False
# Runs AIDE_GREEDY on MLE-bench lite tasks
python -m dojo.main_runner_job_array +_exp=mlebench/aira_greedy_o3 logger.use_wandb=False launcher.debug=False

# Runs AIRA_MCTS on MLE-bench lite tasks
python -m dojo.main_runner_job_array +_exp=mlebench/aira_evo_o3 logger.use_wandb=False launcher.debug=False

# Runs AIRA_EVO on MLE-bench lite tasks
python -m dojo.main_runner_job_array +_exp=mlebench/aira_mcts_o3 logger.use_wandb=False launcher.debug=False

Analyse and Visualize Results

To visualize results checkout src/dojo/ui/README. To learn how to load and extract the best node of each experiment, checkout notebooks/analyze_results.ipynb.

Citation

If you found this work useful, please consider citing:

@article{toledo2025airesearchagentsmachine,
    title={AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench}, 
    author={Edan Toledo and Karen Hambardzumyan and Martin Josifoski and Rishi Hazra and Nicolas Baldwin and Alexis Audran-Reiss and Michael Kuchnik and Despoina Magka and Minqi Jiang and Alisia Maria Lupidi and Andrei Lupu and Roberta Raileanu and Kelvin Niu and Tatiana Shavrina and Jean-Christophe Gagnon-Audet and Michael Shvartsman and Shagun Sodhani and Alexander H. Miller and Abhishek Charnalia and Derek Dunfield and Carole-Jean Wu and Pontus Stenetorp and Nicola Cancedda and Jakob Nicolaus Foerster and Yoram Bachrach},
    year={2025},
    journal={arXiv},
    url={https://arxiv.org/abs/2507.02554}
}

License

This code is made available under a CC BY-NC 4.0 license, as found in the LICENSE file. Some portions of the project are subject to separate license terms outlined in THIRD_PARTY_LICENSES.md.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for aira-dojo

Similar Open Source Tools

For similar tasks

For similar jobs