forecastbench

forecastbench

A dynamic forecasting benchmark for LLMs

Stars: 51

Visit
 screenshot

ForecastBench is a dynamic benchmark tool for evaluating LLM forecasting accuracy with human comparison groups. It provides a contamination-free environment and serves as a proxy for general intelligence. The tool offers leaderboards and datasets updated nightly, along with instructions for submitting models. Users can explore detailed information on the wiki and cite the tool using the provided BibTeX citation. Developers can set up the tool locally, run GCP Cloud Functions, and contribute to the project by following specific guidelines.

README:

ForecastBench

ICLR 2025 arXiv:2409.19839

A dynamic, contamination-free benchmark of LLM forecasting accuracy with human comparison groups, serving as a valuable proxy for general intelligence. More at www.forecastbench.org.

Datasets

Leaderboards and datasets are updated nightly and available at github.com/forecastingresearch/forecastbench-datasets.

Participate in the benchmark

Instructions for how to submit your model to the benchmark can be found here: How-to-submit-to-ForecastBench.

Wiki

Dig into the details of ForecastBench on the wiki.

Citation

@inproceedings{karger2025forecastbench,
      title={ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities},
      author={Ezra Karger and Houtan Bastani and Chen Yueh-Han and Zachary Jacobs and Danny Halawi and Fred Zhang and Philip E. Tetlock},
      year={2025},
      booktitle={International Conference on Learning Representations (ICLR)},
      url={https://iclr.cc/virtual/2025/poster/28507}
}

Getting started for devs

Local setup

  1. git clone --recurse-submodules <repo-url>.git
  2. cd forecastbench
  3. cp variables.example.mk variables.mk and set the values accordingly
  4. Setup your Python virtual environment
    1. make setup-python-env
    2. source .venv/bin/activate

Run GCP Cloud Functions locally

  1. cd directory/containing/cloud/function
  2. eval $(cat path/to/variables.mk | xargs) python main.py

Contributions

Before creating a pull request:

  • run make lint and fix any errors and warnings
  • ensure code has been deployed to Google Cloud Platform and tested (only for our devs, for others, we're happy you're contributing and we'll test this on our end).
  • fork the repo
  • reference the issue number (if one exists) in the commit message
  • push to the fork on a branch other than main
  • create a pull request

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for forecastbench

Similar Open Source Tools

For similar tasks

For similar jobs