Curie

❓Curie: Automated and Rigorous Scientific Experimentation with AI Agents

Stars: 52

Visit

README:

Curie: Automate Rigorous Scientific Experimentation

Curie is the first AI-agent framework designed for automated and rigorous scientific experimentation. Curie helps answer your curiosity through end-to-end experimentation automation, ensuring that every step—from hypothesis formulation to result interpretation—is conducted with precision, reliability, and reproducibility.

Key Features

🚀 Automated Experimentation – End-to-end workflow management: hypothesis formulation, experiment setup, experiment execution, result analysis and finding reflection.
📊 Rigor Enhancement - Built-in verification modules enforce methodical procedure, reliability and interpretability.
🔬 Broad Applicability – Supports ML research, system analysis, and scientific discovery.
📖 Experimentation Benchmark - Provide 46 questions from 4 Computer Science domains, based on influential papers and open-source projects (benchmark/experimentation_bench).

Installation
Quick Start
Use Cases
Tutorial
Customize Your Experiment Agents

Installation

Install docker: https://docs.docker.com/engine/install/ubuntu/.

Grant permission to docker via sudo chmod 666 /var/run/docker.sock.
If you encounter an error that /var/run/docker.sock doesn’t exist, you may find the actual path to docker.sock and create a soft link. For example, Docker Desktop stores this file at ~/.docker/desktop/docker.sock, in which case you may use:
```
sudo chmod 666 ~/.docker/desktop/docker.sock
sudo ln -s ~/.docker/desktop/docker.sock /var/run/docker.sock
```
Run docker ps to check that permission has been granted with the Docker daemon.

Clone the repository:

git clone https://github.com/Just-Curieous/Curie.git
cd Curie

Put your LLM API credentials under curie/setup/env.sh. Example:

export MODEL="gpt-4o" 
export OPENAI_API_KEY="sk-xxx"

Build the container image. This will take a few minutes. Note: you may need to setup a virtual environment before running pip install.

pip install -e .
docker images -q exp-agent-image | xargs -r docker rmi -f # remove any existing conflict image
cd curie && docker build --no-cache --progress=plain -t exp-agent-image -f ExpDockerfile_default .. && cd -

Quick Start

Use the following command to input your research question or problem statement: python3 -m curie.main -q "<Your research question>".

Example 1: Understanding Sorting Algorithm Efficiency

python3 -m curie.main \
  -q "How does the choice of sorting algorithm impact runtime performance across different \
  input distributions (random, nearly sorted, reverse sorted)?" --report

Estimated runtime: ~5 minutes
Sample log file: Available here
Experiment report: Available here.
Log monitoring:
- Real-time logs are streamed to the console.
- Logs are also stored in:
  - logs/research_question_<ID>.log
  - logs/research_question_<ID>_verbose.log
- Experiment report details:
  - Stored in: logs/research_question_<ID>.md
  - Will only be produced when the --report flag is used.
Reproducibility: The full experimentation process is saved in workspace/research_<ID>/.

Example 2: How does the choice of activation function (e.g., ReLU, sigmoid, tanh) impact the model training convergence rate?

python3 -m curie.main -f benchmark/junior_ml_engineer_bench/q1_activation_func.txt --report

Detailed question: q1_diffusion_step.txt
Sample log file: Available here
Sample report file: Available here

More example questions can be found here.

Tutorial

Use Cases

Curie is designed for scientific discovery across multiple domains:

🔬 Machine Learning & AI Research – Hyperparameter tuning and algorithm behavior
- How does the optimal learning rate change with the increase of model size?
- How does repeated sampling in LLM inference affect the quality of response?
💻 System Performance Analysis – Benchmarking systems, optimizing configurations, investigating system trade-offs.
- What configurations affects the energy consumption of LLM serving?
- How does the request bursty arrival pattern affects the user experience in LLM serving?
🧪 Algorithmic & Scientific Discovery – Validating hypotheses, automating computational simulations.

Customize Your Experimentation Agents

Config curie/configs/base_config.json to adapt to your own tasks:

Add your domain-specific instructions by customizing supervisor_system_prompt_filename for the supervisor, control_worker_system_prompt_filename for the experimentation worker, and so on.
Human interruption in the experiment design phase can be activated by setting the is_user_interrupt_allowed key to true.
Configure timeouts and maximum number of steps (global, and coding agent specific).

Community and Support

For any issues or feature requests, please open an issue on our GitHub Issues page.

License

Curie is released under the Apache 2.0 License. See LICENSE for more details.

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for Curie

Similar Open Source Tools

Curie

github

: 52

well-architected-iac-analyzer

Well-Architected Infrastructure as Code (IaC) Analyzer is a project demonstrating how generative AI can evaluate infrastructure code for alignment with best practices. It features a modern web application allowing users to upload IaC documents, complete IaC projects, or architecture diagrams for assessment. The tool provides insights into infrastructure code alignment with AWS best practices, offers suggestions for improving cloud architecture designs, and can generate IaC templates from architecture diagrams. Users can analyze CloudFormation, Terraform, or AWS CDK templates, architecture diagrams in PNG or JPEG format, and complete IaC projects with supporting documents. Real-time analysis against Well-Architected best practices, integration with AWS Well-Architected Tool, and export of analysis results and recommendations are included.

github

: 196

RainbowGPT

RainbowGPT is a versatile tool that offers a range of functionalities, including Stock Analysis for financial decision-making, MySQL Management for database navigation, and integration of AI technologies like GPT-4 and ChatGlm3. It provides a user-friendly interface suitable for all skill levels, ensuring seamless information flow and continuous expansion of emerging technologies. The tool enhances adaptability, creativity, and insight, making it a valuable asset for various projects and tasks.

github

: 86

AutoAgent

AutoAgent is a fully-automated and zero-code framework that enables users to create and deploy LLM agents through natural language alone. It is a top performer on the GAIA Benchmark, equipped with a native self-managing vector database, and allows for easy creation of tools, agents, and workflows without any coding. AutoAgent seamlessly integrates with a wide range of LLMs and supports both function-calling and ReAct interaction modes. It is designed to be dynamic, extensible, customized, and lightweight, serving as a personal AI assistant.

github

: 1.9k

pentagi

PentAGI is an innovative tool for automated security testing that leverages cutting-edge artificial intelligence technologies. It is designed for information security professionals, researchers, and enthusiasts who need a powerful and flexible solution for conducting penetration tests. The tool provides secure and isolated operations in a sandboxed Docker environment, fully autonomous AI-powered agent for penetration testing steps, a suite of 20+ professional security tools, smart memory system for storing research results, web intelligence for gathering information, integration with external search systems, team delegation system, comprehensive monitoring and reporting, modern interface, API integration, persistent storage, scalable architecture, self-hosted solution, flexible authentication, and quick deployment through Docker Compose.

github

: 170

code2prompt

Code2Prompt is a powerful command-line tool that generates comprehensive prompts from codebases, designed to streamline interactions between developers and Large Language Models (LLMs) for code analysis, documentation, and improvement tasks. It bridges the gap between codebases and LLMs by converting projects into AI-friendly prompts, enabling users to leverage AI for various software development tasks. The tool offers features like holistic codebase representation, intelligent source tree generation, customizable prompt templates, smart token management, Gitignore integration, flexible file handling, clipboard-ready output, multiple output options, and enhanced code readability.

github

: 734

distillKitPlus

DistillKitPlus is an open-source toolkit designed for knowledge distillation (KLD) in low computation resource settings. It supports logit distillation, pre-computed logits for memory-efficient training, LoRA fine-tuning integration, and model quantization for faster inference. The toolkit utilizes a JSON configuration file for project, dataset, model, tokenizer, training, distillation, LoRA, and quantization settings. Users can contribute to the toolkit and contact the developers for technical questions or issues.

github

: 52

ppl.llm.serving

PPL LLM Serving is a serving based on ppl.nn for various Large Language Models (LLMs). It provides inference support for LLaMA. Key features include: * **High Performance:** Optimized for fast and efficient inference on LLM models. * **Scalability:** Supports distributed deployment across multiple GPUs or machines. * **Flexibility:** Allows for customization of model configurations and inference pipelines. * **Ease of Use:** Provides a user-friendly interface for deploying and managing LLM models. This tool is suitable for various tasks, including: * **Text Generation:** Generating text, stories, or code from scratch or based on a given prompt. * **Text Summarization:** Condensing long pieces of text into concise summaries. * **Question Answering:** Answering questions based on a given context or knowledge base. * **Language Translation:** Translating text between different languages. * **Chatbot Development:** Building conversational AI systems that can engage in natural language interactions. Keywords: llm, large language model, natural language processing, text generation, question answering, language translation, chatbot development

github

: 114

openmeter

OpenMeter is a real-time and scalable usage metering tool for AI, usage-based billing, infrastructure, and IoT use cases. It provides a REST API for integrations and offers client SDKs in Node.js, Python, Go, and Web. OpenMeter is licensed under the Apache 2.0 License.

github

: 1.3k

extension-gen-ai

The Looker GenAI Extension provides code examples and resources for building a Looker Extension that integrates with Vertex AI Large Language Models (LLMs). Users can leverage the power of LLMs to enhance data exploration and analysis within Looker. The extension offers generative explore functionality to ask natural language questions about data and generative insights on dashboards to analyze data by asking questions. It leverages components like BQML Remote Models, BQML Remote UDF with Vertex AI, and Custom Fine Tune Model for different integration options. Deployment involves setting up infrastructure with Terraform and deploying the Looker Extension by creating a Looker project, copying extension files, configuring BigQuery connection, connecting to Git, and testing the extension. Users can save example prompts and configure user settings for the extension. Development of the Looker Extension environment includes installing dependencies, starting the development server, and building for production.

github

: 59

web-ui

WebUI is a user-friendly tool built on Gradio that enhances website accessibility for AI agents. It supports various Large Language Models (LLMs) and allows custom browser integration for seamless interaction. The tool eliminates the need for re-login and authentication challenges, offering high-definition screen recording capabilities.

github

: 10.4k

recommendarr

Recommendarr is a tool that generates personalized TV show and movie recommendations based on your Sonarr, Radarr, Plex, and Jellyfin libraries using AI. It offers AI-powered recommendations, media server integration, flexible AI support, watch history analysis, customization options, and dark/light mode toggle. Users can connect their media libraries and watch history services, configure AI service settings, and get personalized recommendations based on genre, language, and mood/vibe preferences. The tool works with any OpenAI-compatible API and offers various recommended models for different cost options and performance levels. It provides personalized suggestions, detailed information, filter options, watch history analysis, and one-click adding of recommended content to Sonarr/Radarr.

github

: 516

ps-fuzz

The Prompt Fuzzer is an open-source tool that helps you assess the security of your GenAI application's system prompt against various dynamic LLM-based attacks. It provides a security evaluation based on the outcome of these attack simulations, enabling you to strengthen your system prompt as needed. The Prompt Fuzzer dynamically tailors its tests to your application's unique configuration and domain. The Fuzzer also includes a Playground chat interface, giving you the chance to iteratively improve your system prompt, hardening it against a wide spectrum of generative AI attacks.

github

: 367

MetaGPT

MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It assigns different roles to GPTs to form a collaborative entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories, competitive analysis, requirements, data structures, APIs, documents, etc. Internally, MetaGPT includes product managers, architects, project managers, and engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. MetaGPT's core philosophy is "Code = SOP(Team)", materializing SOP and applying it to teams composed of LLMs.

github

: 51.4k

sec-parser

The `sec-parser` project simplifies extracting meaningful information from SEC EDGAR HTML documents by organizing them into semantic elements and a tree structure. It helps in parsing SEC filings for financial and regulatory analysis, analytics and data science, AI and machine learning, causal AI, and large language models. The tool is especially beneficial for AI, ML, and LLM applications by streamlining data pre-processing and feature extraction.

github

: 99

gpt-computer-assistant

GPT Computer Assistant (GCA) is an open-source framework designed to build vertical AI agents that can automate tasks on Windows, macOS, and Ubuntu systems. It leverages the Model Context Protocol (MCP) and its own modules to mimic human-like actions and achieve advanced capabilities. With GCA, users can empower themselves to accomplish more in less time by automating tasks like updating dependencies, analyzing databases, and configuring cloud security settings.

github

: 5.8k

For similar tasks

No tools available

For similar jobs

No tools available

Curie

README:

Curie: Automate Rigorous Scientific Experimentation

Table of Contents

Installation

Quick Start

Example 1: Understanding Sorting Algorithm Efficiency

Example 2: How does the choice of activation function (e.g., ReLU, sigmoid, tanh) impact the model training convergence rate?

Tutorial

Use Cases

Customize Your Experimentation Agents

Community and Support

License

For Tasks:

For Jobs:

Alternative AI tools for Curie

Similar Open Source Tools

Curie

well-architected-iac-analyzer

RainbowGPT

AutoAgent

pentagi

code2prompt

distillKitPlus

ppl.llm.serving

openmeter

extension-gen-ai

web-ui

recommendarr

ps-fuzz

MetaGPT

sec-parser

gpt-computer-assistant

For similar tasks

For similar jobs