AI-Scientist
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery π§βπ¬
Stars: 6567
The AI Scientist is a comprehensive system for fully automatic scientific discovery, enabling Foundation Models to perform research independently. It aims to tackle the grand challenge of developing agents capable of conducting scientific research and discovering new knowledge. The tool generates papers on various topics using Large Language Models (LLMs) and provides a platform for exploring new research ideas. Users can create their own templates for specific areas of study and run experiments to generate papers. However, caution is advised as the codebase executes LLM-written code, which may pose risks such as the use of potentially dangerous packages and web access.
README:
π [Paper] | π [Blog Post] | π [Drive Folder]
One of the grand challenges of artificial intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used to aid human scientists, e.g. for brainstorming ideas or writing code, they still require extensive manual supervision or are heavily constrained to a specific task.
We're excited to introduce The AI Scientist, the first comprehensive system for fully automatic scientific discovery, enabling Foundation Models such as Large Language Models (LLMs) to perform research independently.
We further provide all runs and data from our paper here, where we run each base model on each template for ~50 ideas. We highly recommend reading through some of the Claude papers, (especially the diffusion ones), to get a sense of its strengths and weaknesses. Here are some example papers generated by The AI Scientist π:
- DualScale Diffusion: Adaptive Feature Balancing for Low-Dimensional Generative Models
- Multi-scale Grid Noise Adaptation: Enhancing Diffusion Models For Low-dimensional Data
- GAN-Enhanced Diffusion: Boosting Sample Quality and Diversity
- DualDiff: Enhancing Mode Capture in Low-dimensional Diffusion Models via Dual-expert Denoising
- StyleFusion: Adaptive Multi-style Generation in Character-Level Language Models
- Adaptive Learning Rates for Transformers via Q-Learning
- Unlocking Grokking: A Comparative Study of Weight Initialization Strategies in Transformer Models
- Grokking Accelerated: Layer-wise Learning Rates for Transformer Generalization
- Grokking Through Compression: Unveiling Sudden Generalization via Minimal Description Length
- Accelerating Mathematical Insight: Boosting Grokking Through Strategic Data Augmentation
Note: Caution! This codebase will execute LLM-written code. There are various risks and challenges associated with this autonomy. This includes e.g. the use of potentially dangerous packages, web access, and potential spawning of processes. Use at your own discretion. Please make sure to containerize and restrict web access appropriately.
- Requirements
- Run AI Scientist Paper Generation Experiments
- Getting an LLM Generated Paper Review
- Making your own Template
- Template Resources
- Citing The AI Scientist
- Frequently Asked Questions
- Containerization
conda create -n ai_scientist python=3.11
conda activate ai_scientist
# Install pdflatex
sudo apt-get install texlive-full
# Install pypi requirements
pip install -r requirements.txt
When installing texlive-full
, you may need to hold Enter.
By default, this uses the OPENAI_API_KEY
environment variable.
By default, this uses the ANTHROPIC_API_KEY
environment variable.
For Claude models provided by Amazon Bedrock, please install these additional packages:
pip install anthropic[bedrock]
Next, specify a set of valid AWS Credentials and the target AWS Region:
Set these environment variables: AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, AWS_REGION_NAME
.
For Claude models provided by Vertex AI Model Garden, please install these additional packages:
pip install google-cloud-aiplatform
pip install anthropic[vertex]
Next, set up a valid authentication for a Google Cloud project, for example by providing region and project ID like so:
export CLOUD_ML_REGION="REGION" # for Model Garden call
export ANTHROPIC_VERTEX_PROJECT_ID="PROJECT_ID" # for Model Garden call
export VERTEXAI_LOCATION="REGION" # for Aider/LiteLLM call, as per https://docs.litellm.ai/docs/providers/vertex#set-vertex-project--vertex-location
export VERTEXAI_PROJECT="PROJECT_ID" # for Aider/LiteLLM call as per https://docs.litellm.ai/docs/providers/vertex#set-vertex-project--vertex-location
By default, this uses the DEEPSEEK_API_KEY
environment variable.
By default, this uses the OPENROUTER_API_KEY
environment variable.
Our code can also optionally use a Semantic Scholar API Key (S2_API_KEY
) for higher throughput if you have one, though in principle it should work without it.
Be sure to provide the key for the model used for your runs, e.g.
export OPENAI_API_KEY="YOUR KEY HERE"
export S2_API_KEY="YOUR KEY HERE"
# Prepare NanoGPT data
python data/enwik8/prepare.py
python data/shakespeare_char/prepare.py
python data/text8/prepare.py
# Set up NanoGPT baseline run
# NOTE: YOU MUST FIRST RUN THE PREPARE SCRIPTS ABOVE!
cd templates/nanoGPT && python experiment.py --out_dir run_0 && python plot.py
# NOTE: YOU MUST FIRST RUN THE PREPARE SCRIPTS ABOVE!
cd templates/nanoGPT_lite && python experiment.py --out_dir run_0 && python plot.py
# Set up 2D Diffusion
git clone https://github.com/gregversteeg/NPEET.git
cd NPEET
pip install .
pip install scikit-learn
# Set up 2D Diffusion baseline run
cd templates/2d_diffusion && python experiment.py --out_dir run_0 && python plot.py
# Set up Grokking
pip install einops
# Set up Grokking baseline run
cd templates/grokking && python experiment.py --out_dir run_0 && python plot.py
Note: please ensure the setup steps above are completed.
conda activate ai_scientist
# Run the paper generation.
python launch_scientist.py --model "gpt-4o-2024-05-13" --experiment nanoGPT_lite --num-ideas 2
python launch_scientist.py --model "claude-3-5-sonnet-20240620" --experiment nanoGPT_lite --num-ideas 2
import openai
from ai_scientist.perform_review import load_paper, perform_review
client = openai.OpenAI()
model = "gpt-4o-2024-05-13"
# Load paper from pdf file (raw text)
paper_txt = load_paper("report.pdf")
# Get the review dict of the review
review = perform_review(
paper_txt,
model,
client,
num_reflections=5,
num_fs_examples=1,
num_reviews_ensemble=5,
temperature=0.1,
)
# Inspect review results
review["Overall"] # overall score 1-10
review["Decision"] # ['Accept', 'Reject']
review["Weaknesses"] # List of weaknesses (str)
To run batch analysis:
cd review_iclr_bench
python iclr_analysis.py --num_reviews 500 --batch_size 100 --num_fs_examples 1 --num_reflections 5 --temperature 0.1 --num_reviews_ensemble 5
If there is an area of study you would like The AI Scientist to explore, it should be very easy to create your own templates. In general, follow the structure of the existing templates, which consists of:
-
experiment.py
-- This is a single file where the 'meat' of the content is. It takes in an argument forout_dir
, which is where it should create the folder and save the relevant information from the run. -
plot.py
-- This should take in the information from therun
folders and create plots. The code should be clear and easy to edit. -
prompt.json
-- Put information about your template here. -
seed_ideas.json
-- Put example ideas here. You can also try to generate ideas without any examples, and then pick the best one or two to put here. -
latex/template.tex
-- We recommend using our latex folder, but be sure to replace the pre-loaded citations with ones that you would expect to be more relevant.
We provide 3 templates, which heavily use code from other repositories, which we credit below. (Normally, we would do this in the files themselves, but it's unclear how this would affect The AI Scientist since it would be visible).
The NanoGPT template used code from NanoGPT and this PR.
The 2D Diffusion template used code from tiny-diffusion, ema-pytorch, and Datasaur.
The Grokking template used code from Sea-Snell/grokking and danielmamay/grokking.
We would like to thank the developers of the open-source models and packages for their contributions and for making their work available.
If you use The AI Scientist in your research, please cite it as follows:
@article{lu2024aiscientist,
title={The {AI} {S}cientist: Towards Fully Automated Open-Ended Scientific Discovery},
author={Lu, Chris and Lu, Cong and Lange, Robert Tjarko and Foerster, Jakob and Clune, Jeff and Ha, David},
journal={arXiv preprint arXiv:2408.06292},
year={2024}
}
We recommend reading our paper in the first instance for any questions you have on The AI Scientist.
Make sure you have completed all the setup and preparation steps before the main experiment script.
The AI Scientist finishes an idea with a success rate that depends on both the template, the base foundation model, and the complexity of the idea. We advise referring to our main paper. The highest success rates are observed with Claude Sonnet 3.5. Reviews are best done with GPT-4o, all other models have issues with positivity bias or failure to conform to required outputs.
Typically less than $15 per paper with Claude Sonnet 3.5. We recommend DeepSeek Coder V2 for a much more cost-effective approach. A good place to look for new models is the Aider leaderboard.
Change the base template.tex
files contained within each template.
Please refer to the instructions for different templates. In this current iteration, this is restricted to ideas that can be expressed in code. However, lifting this restriction would represent exciting future work! :)
Please see this PR for an example of how to add a new model, e.g. this time for Claude via Bedrock. We do not advise any model that is significantly weaker than GPT-4 level for The AI Scientist.
These appear as run_0
and should be run per machine you execute The AI Scientist on for accurate run-time comparisons due to hardware differences.
We include a community-contributed Docker image that may assist with your containerization efforts in experimental/Dockerfile
.
You can use this image like this:
# Endpoint Script
docker run -e OPENAI_API_KEY=$OPENAI_API_KEY -v `pwd`/templates:/app/AI-Scientist/templates <AI_SCIENTIST_IMAGE> \
--model gpt-4o-2024-05-13 \
--experiment 2d_diffusion \
--num-ideas 2
# Interactive
docker run -it -e OPENAI_API_KEY=$OPENAI_API_KEY \
--entrypoint /bin/bash \
<AI_SCIENTIST_IMAGE>
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for AI-Scientist
Similar Open Source Tools
AI-Scientist
The AI Scientist is a comprehensive system for fully automatic scientific discovery, enabling Foundation Models to perform research independently. It aims to tackle the grand challenge of developing agents capable of conducting scientific research and discovering new knowledge. The tool generates papers on various topics using Large Language Models (LLMs) and provides a platform for exploring new research ideas. Users can create their own templates for specific areas of study and run experiments to generate papers. However, caution is advised as the codebase executes LLM-written code, which may pose risks such as the use of potentially dangerous packages and web access.
ai-starter-kit
SambaNova AI Starter Kits is a collection of open-source examples and guides designed to facilitate the deployment of AI-driven use cases for developers and enterprises. The kits cover various categories such as Data Ingestion & Preparation, Model Development & Optimization, Intelligent Information Retrieval, and Advanced AI Capabilities. Users can obtain a free API key using SambaNova Cloud or deploy models using SambaStudio. Most examples are written in Python but can be applied to any programming language. The kits provide resources for tasks like text extraction, fine-tuning embeddings, prompt engineering, question-answering, image search, post-call analysis, and more.
aiexe
aiexe is a cutting-edge command-line interface (CLI) and graphical user interface (GUI) tool that integrates powerful AI capabilities directly into your terminal or desktop. It is designed for developers, tech enthusiasts, and anyone interested in AI-powered automation. aiexe provides an easy-to-use yet robust platform for executing complex tasks with just a few commands. Users can harness the power of various AI models from OpenAI, Anthropic, Ollama, Gemini, and GROQ to boost productivity and enhance decision-making processes.
crewAI-tools
This repository provides a guide for setting up tools for crewAI agents to enhance functionality. It offers steps to equip agents with ready-to-use tools and create custom ones. Tools are expected to return strings for generating responses. Users can create tools by subclassing BaseTool or using the tool decorator. Contributions are welcome to enrich the toolset, and guidelines are provided for contributing. The development setup includes installing dependencies, activating virtual environment, setting up pre-commit hooks, running tests, static type checking, packaging, and local installation. The goal is to empower AI solutions through advanced tooling.
llm-compressor
llm-compressor is an easy-to-use library for optimizing models for deployment with vllm. It provides a comprehensive set of quantization algorithms, seamless integration with Hugging Face models and repositories, and supports mixed precision, activation quantization, and sparsity. Supported algorithms include PTQ, GPTQ, SmoothQuant, and SparseGPT. Installation can be done via git clone and local pip install. Compression can be easily applied by selecting an algorithm and calling the oneshot API. The library also offers end-to-end examples for model compression. Contributions to the code, examples, integrations, and documentation are appreciated.
xlang
XLangβ’ is a cutting-edge language designed for AI and IoT applications, offering exceptional dynamic and high-performance capabilities. It excels in distributed computing and seamless integration with popular languages like C++, Python, and JavaScript. Notably efficient, running 3 to 5 times faster than Python in AI and deep learning contexts. Features optimized tensor computing architecture for constructing neural networks through tensor expressions. Automates tensor data flow graph generation and compilation for specific targets, enhancing GPU performance by 6 to 10 times in CUDA environments.
WindowsAgentArena
Windows Agent Arena (WAA) is a scalable Windows AI agent platform designed for testing and benchmarking multi-modal, desktop AI agents. It provides researchers and developers with a reproducible and realistic Windows OS environment for AI research, enabling testing of agentic AI workflows across various tasks. WAA supports deploying agents at scale using Azure ML cloud infrastructure, allowing parallel running of multiple agents and delivering quick benchmark results for hundreds of tasks in minutes.
humanoid-gym
Humanoid-Gym is a reinforcement learning framework designed for training locomotion skills for humanoid robots, focusing on zero-shot transfer from simulation to real-world environments. It integrates a sim-to-sim framework from Isaac Gym to Mujoco for verifying trained policies in different physical simulations. The codebase is verified with RobotEra's XBot-S and XBot-L humanoid robots. It offers comprehensive training guidelines, step-by-step configuration instructions, and execution scripts for easy deployment. The sim2sim support allows transferring trained policies to accurate simulated environments. The upcoming features include Denoising World Model Learning and Dexterous Hand Manipulation. Installation and usage guides are provided along with examples for training PPO policies and sim-to-sim transformations. The code structure includes environment and configuration files, with instructions on adding new environments. Troubleshooting tips are provided for common issues, along with a citation and acknowledgment section.
gemini-pro-bot
This Python Telegram bot utilizes Google's `gemini-pro` LLM API to generate creative text formats based on user input. It's designed to be an engaging and interactive way to explore the capabilities of large language models. Key features include generating various text formats like poems, code, scripts, and musical pieces. The bot supports real-time streaming of the generation process, allowing users to witness the text unfold. Additionally, it can respond to messages with Bard's creative output and handle image-based inputs for multimodal responses. User authentication is optional, and the bot can be easily integrated with Docker or installed via pipenv.
sandbox
Sandbox is an open-source cloud-based code editing environment with custom AI code autocompletion and real-time collaboration. It consists of a frontend built with Next.js, TailwindCSS, Shadcn UI, Clerk, Monaco, and Liveblocks, and a backend with Express, Socket.io, Cloudflare Workers, D1 database, R2 storage, Workers AI, and Drizzle ORM. The backend includes microservices for database, storage, and AI functionalities. Users can run the project locally by setting up environment variables and deploying the containers. Contributions are welcome following the commit convention and structure provided in the repository.
neural
Neural is a Vim and Neovim plugin that integrates various machine learning tools to assist users in writing code, generating text, and explaining code or paragraphs. It supports multiple machine learning models, focuses on privacy, and is compatible with Vim 8.0+ and Neovim 0.8+. Users can easily configure Neural to interact with third-party machine learning tools, such as OpenAI, to enhance code generation and completion. The plugin also provides commands like `:NeuralExplain` to explain code or text and `:NeuralStop` to stop Neural from working. Neural is maintained by the Dense Analysis team and comes with a disclaimer about sending input data to third-party servers for machine learning queries.
SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.
cortex
Nitro is a high-efficiency C++ inference engine for edge computing, powering Jan. It is lightweight and embeddable, ideal for product integration. The binary of nitro after zipped is only ~3mb in size with none to minimal dependencies (if you use a GPU need CUDA for example) make it desirable for any edge/server deployment.
TriForce
TriForce is a training-free tool designed to accelerate long sequence generation. It supports long-context Llama models and offers both on-chip and offloading capabilities. Users can achieve a 2.2x speedup on a single A100 GPU. TriForce also provides options for offloading with tensor parallelism or without it, catering to different hardware configurations. The tool includes a baseline for comparison and is optimized for performance on RTX 4090 GPUs. Users can cite the associated paper if they find TriForce useful for their projects.
deaddit
Deaddit is a project showcasing an AI-filled internet platform similar to Reddit. All content, including subdeaddits, posts, and comments, is generated by AI algorithms. Users can interact with AI-generated content and explore a simulated social media experience. The project provides a demonstration of how AI can be used to create online content and simulate user interactions in a virtual community.
morphic
Morphic is an AI-powered answer engine with a generative UI. It utilizes a stack of Next.js, Vercel AI SDK, OpenAI, Tavily AI, shadcn/ui, Radix UI, and Tailwind CSS. To get started, fork and clone the repo, install dependencies, fill out secrets in the .env.local file, and run the app locally using 'bun dev'. You can also deploy your own live version of Morphic with Vercel. Verified models that can be specified to writers include Groq, LLaMA3 8b, and LLaMA3 70b.
For similar tasks
AI-Scientist
The AI Scientist is a comprehensive system for fully automatic scientific discovery, enabling Foundation Models to perform research independently. It aims to tackle the grand challenge of developing agents capable of conducting scientific research and discovering new knowledge. The tool generates papers on various topics using Large Language Models (LLMs) and provides a platform for exploring new research ideas. Users can create their own templates for specific areas of study and run experiments to generate papers. However, caution is advised as the codebase executes LLM-written code, which may pose risks such as the use of potentially dangerous packages and web access.
peridyno
PeriDyno is a CUDA-based, highly parallel physics engine targeted at providing real-time simulation of physical environments for intelligent agents. It is designed to be easy to use and integrate into existing projects, and it provides a wide range of features for simulating a variety of physical phenomena. PeriDyno is open source and available under the Apache 2.0 license.
LLMinator
LLMinator is a Gradio-based tool with an integrated chatbot designed to locally run and test Language Model Models (LLMs) directly from HuggingFace. It provides an easy-to-use interface made with Gradio, LangChain, and Torch, offering features such as context-aware streaming chatbot, inbuilt code syntax highlighting, loading any LLM repo from HuggingFace, support for both CPU and CUDA modes, enabling LLM inference with llama.cpp, and model conversion capabilities.
generative-ai-application-builder-on-aws
The Generative AI Application Builder on AWS (GAAB) is a solution that provides a web-based management dashboard for deploying customizable Generative AI (Gen AI) use cases. Users can experiment with and compare different combinations of Large Language Model (LLM) use cases, configure and optimize their use cases, and integrate them into their applications for production. The solution is targeted at novice to experienced users who want to experiment and productionize different Gen AI use cases. It uses LangChain open-source software to configure connections to Large Language Models (LLMs) for various use cases, with the ability to deploy chat use cases that allow querying over users' enterprise data in a chatbot-style User Interface (UI) and support custom end-user implementations through an API.
For similar jobs
Perplexica
Perplexica is an open-source AI-powered search engine that utilizes advanced machine learning algorithms to provide clear answers with sources cited. It offers various modes like Copilot Mode, Normal Mode, and Focus Modes for specific types of questions. Perplexica ensures up-to-date information by using SearxNG metasearch engine. It also features image and video search capabilities and upcoming features include finalizing Copilot Mode and adding Discover and History Saving features.
KULLM
KULLM (ꡬλ¦) is a Korean Large Language Model developed by Korea University NLP & AI Lab and HIAI Research Institute. It is based on the upstage/SOLAR-10.7B-v1.0 model and has been fine-tuned for instruction. The model has been trained on 8ΓA100 GPUs and is capable of generating responses in Korean language. KULLM exhibits hallucination and repetition phenomena due to its decoding strategy. Users should be cautious as the model may produce inaccurate or harmful results. Performance may vary in benchmarks without a fixed system prompt.
MMMU
MMMU is a benchmark designed to evaluate multimodal models on college-level subject knowledge tasks, covering 30 subjects and 183 subfields with 11.5K questions. It focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of various models highlights substantial challenges, with room for improvement to stimulate the community towards expert artificial general intelligence (AGI).
1filellm
1filellm is a command-line data aggregation tool designed for LLM ingestion. It aggregates and preprocesses data from various sources into a single text file, facilitating the creation of information-dense prompts for large language models. The tool supports automatic source type detection, handling of multiple file formats, web crawling functionality, integration with Sci-Hub for research paper downloads, text preprocessing, and token count reporting. Users can input local files, directories, GitHub repositories, pull requests, issues, ArXiv papers, YouTube transcripts, web pages, Sci-Hub papers via DOI or PMID. The tool provides uncompressed and compressed text outputs, with the uncompressed text automatically copied to the clipboard for easy pasting into LLMs.
gpt-researcher
GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks. It can produce detailed, factual, and unbiased research reports with customization options. The tool addresses issues of speed, determinism, and reliability by leveraging parallelized agent work. The main idea involves running 'planner' and 'execution' agents to generate research questions, seek related information, and create research reports. GPT Researcher optimizes costs and completes tasks in around 3 minutes. Features include generating long research reports, aggregating web sources, an easy-to-use web interface, scraping web sources, and exporting reports to various formats.
ChatTTS
ChatTTS is a generative speech model optimized for dialogue scenarios, providing natural and expressive speech synthesis with fine-grained control over prosodic features. It supports multiple speakers and surpasses most open-source TTS models in terms of prosody. The model is trained with 100,000+ hours of Chinese and English audio data, and the open-source version on HuggingFace is a 40,000-hour pre-trained model without SFT. The roadmap includes open-sourcing additional features like VQ encoder, multi-emotion control, and streaming audio generation. The tool is intended for academic and research use only, with precautions taken to limit potential misuse.
HebTTS
HebTTS is a language modeling approach to diacritic-free Hebrew text-to-speech (TTS) system. It addresses the challenge of accurately mapping text to speech in Hebrew by proposing a language model that operates on discrete speech representations and is conditioned on a word-piece tokenizer. The system is optimized using weakly supervised recordings and outperforms diacritic-based Hebrew TTS systems in terms of content preservation and naturalness of generated speech.
do-research-in-AI
This repository is a collection of research lectures and experience sharing posts from frontline researchers in the field of AI. It aims to help individuals upgrade their research skills and knowledge through insightful talks and experiences shared by experts. The content covers various topics such as evaluating research papers, choosing research directions, research methodologies, and tips for writing high-quality scientific papers. The repository also includes discussions on academic career paths, research ethics, and the emotional aspects of research work. Overall, it serves as a valuable resource for individuals interested in advancing their research capabilities in the field of AI.