stata-mcp

Let LLM help you achieve your regression with Stata. Evolve from reg monkey to causal thinker.

Stars: 81

Visit

Stata-MCP is a tool designed to help users with regression analysis in Stata. It includes features such as a security guard system for validation against dangerous commands, RAM monitoring for real-time monitoring and automatic process termination, unified configuration with TOML-based config files, and complete documentation. Users can use Stata-MCP for tasks like paper replication, quick hypothesis testing, learning econometrics, code organization, and result interpretation. The tool is suitable for researchers, data analysts, economists, statisticians, and social scientists.

README:

Stata-MCP

Let LLM help you achieve your regression analysis with Stata ✨
Evolve from reg monkey to causal thinker 🐒 -> 🧐

Notes: While we strive to make open source accessible to everyone, we regret that we can no longer maintain the Apache-2.0 License. Due to individuals directly copying this project and claiming to be its maintainers, we have decided to change the license to AGPL-3.0 to prevent misuse of the project in ways that go against our original vision.

Notes: 尽管我们希望尽可能让所有人都能从开源中获益，但我们很遗憾地宣布无法继续保持 Apache-2.0 License。由于有人直接抄袭本项目并标榜其为项目维护者，我们不得不将 License 更改为 AGPL-3.0，以防止有人滥用本项目进行违背项目初心的事情。

Reason

Background: @jackdark425's repository directly copied this project and claimed to be the sole maintainer. We welcome open source collaboration based on forks, including but not limited to adding new features, fixing existing bugs, or providing valuable suggestions for the project, but we firmly oppose plagiarism and false attribution.

Update: The infringing project has been taken down via GitHub DMCA. Click here to learn about.

背景: @jackdark425 的仓库直接抄袭了本项目并标榜为项目唯一维护者。我们欢迎基于fork的开源协作，包括但不限于添加新的feature、修改已有bug或对项目提出您宝贵的意见，但坚决反对抄袭和虚假署名行为。

更新: 侵权项目已通过GitHub DMCA被takedown，点击这里查看详情。

News:

✨ Claude Code Plugin Support: Official plugin package with MCP server and Stata LSP integration
✨ Security Guard System: Automatic validation against dangerous commands (shell execution, file deletion, etc.)
✨ RAM Monitoring System: Real-time monitoring with automatic process termination when memory limits exceeded
✨ Unified Configuration: TOML-based config file with environment variable overrides
📚 Complete Documentation: New Configuration, Security, and Monitoring guides
Use Stata-MCP in Claude Code, look here
Try to use agent mode as tool? Now it is supported more easily here.
Want to evaluate your LLM? Look here.
Update StataFinder, it could locate your Stata executable file automatically.

Finding our newest research? Click here or visit reports website.

Looking for others?

MCP or AI about Stata

A session based MCP server for Stata, mcp-stata

A VScode or Cursor integrated here. Confused it? 💡 Difference

Datasets and Informations

STOP Dataset: StataMCP-Team Opendata Project 📊, we have open-sourced a comprehensive dataset collection for social science research, aiming to enable the future of AI-driven and data-powered research paradigms.

Trace DID: If you want to fetch the newest information about DID (Difference-in-Difference), click here. Now there is a Chinese translation by Sepine Tam and StataMCP-Team 🎉

Jupyter Lab Usage (Important: Stata 17+) here and nbstata

💡 Quickly Start

Use Stata-MCP in Claude Code

We can use Stata-MCP in Claude Code as its prefect agentic ability.

Before using it, please make sure you have ever install Claude Code, if you don't know how to install it, visit on GitHub

You can open your terminal and cd to your working directory, and run:

claude mcp add stata-mcp --env STATA_MCP_CWD=$(pwd) --scope project -- uvx --directory $(pwd) stata-mcp

In your working directory, you can find a file named .mcp.json, your mcp config will be placed here.

If you want to install Stata-MCP globally, you can run:

claude mcp add stata-mcp --scope user -- uvx stata-mcp

Then, you can use Stata-MCP in Claude Code. Here are some scenarios for using it:

Paper Replication: Replicate empirical studies from economics papers
Quick Hypothesis Testing: Validate economic hypotheses through regression analysis
Stata Learning Assistant: Learn econometrics with step-by-step Stata explanations
Code Organization: Review and optimize existing Stata do-files
Result Interpretation: Understand complex statistical outputs and regression results

Install Claude Code Plugin

We provide official native plugin, integrating Stata-MCP maintained by @sepinetam and Stata LSP maintained by @euglevi. Installation commands:

# Install stata-mcp marketplace first
claude plugin marketplace add sepinetam/stata-mcp

# Install plugin to local, project or user scope
claude plugin install stata-toolbox -s local

Agent Mode

The details of agent mode find here.

git clone https://github.com/sepinetam/stata-mcp.git
cd stata-mcp

uv sync
uv pip install -e .

stata-mcp --version  # for test whether stata-mcp is installed successfully.
stata-mcp agent run  # now you can enjoy your stata-mcp agent mode.

or you can directly use it with uvx:

uvx stata-mcp --version  # for test whether it could be used on your computer.
uvx stata-mcp agent run

You can edit the task in agent_examples/openai/main.py for variable [model_instructions](source/agent_examples/openai/main.py#L37) and [task_message](source/agent_examples/openai/main.py#L68)

Agent as Tool

If you want to use a Stata-Agent in another agent, here is a simple example:

import asyncio

from agents import Agent, Runner
from stata_mcp.agent_as.agent_as_tool import StataAgent

# init stata agent and set as tool
stata_agent = StataAgent()
sa_tool = stata_agent.as_tool()

# Create main Agent
agent = Agent(
    name="Assistant",
    instructions="You are a helpful assistant",
    tools=[sa_tool],
)


# Then run the agent as usual.
async def main(task: str, max_turns: int = 30):
    result = await Runner.run(agent, input=task, max_turns=max_turns)
    return result


if __name__ == "__main__":
    econ_task = "Use Stata default data to find out the relationship between mpg and price."
    asyncio.run(main(econ_task))

AI Chat-Bot Client Mode

Standard config requires: please make sure the stata is installed at the default path, and the stata cli (for macOS and Linux) exists.

The standard config json as follows, you can DIY your config via add envs.

{
  "mcpServers": {
    "stata-mcp": {
      "command": "uvx",
      "args": [
        "stata-mcp"
      ]
    }
  }
}

For more detailed usage information, visit the Usage guide.

And some advanced usage, visit the Advanced guide

Prerequisites

uv - Package installer and virtual environment manager
Claude, Cline, ChatWise, or other LLM service
Stata License
Your API-KEY from LLM

Notes:

If you are located in China, a short uv usage document you can find here.

Claude is the best choice for Stata-MCP, for Chinese, I recommend to use DeepSeek as your model provider as it is cheap and powerful, also the score is highest in China provider, if you are increased in it, visit the report How to use StataMCP improve your social science research.

Installation

For the new version, you don't need to install the stata-mcp package again, you can just use the following command to check whether your computer can use stata-mcp.

uvx stata-mcp --usable
uvx stata-mcp --version

If you want to use it locally, you can install it via pip or download the source code.

Download via pip

pip install stata-mcp

Download source code and compile

git clone https://github.com/sepinetam/stata-mcp.git
cd stata-mcp

uv build

Then you can find the compiled stata-mcp binary in the dist directory. You can use it directly or add it to your PATH.

For example:

uvx /path/to/your/whl/stata_mcp-1.13.0-py3-non-any.whl  # here is the wheel file name, you can change it to your version

📝 Documentation

Core Documentation

Complete Documentation: Full documentation site with all features
Configuration Guide: Unified TOML-based configuration system
Security Guard: Security validation for dangerous commands
Monitoring System: RAM monitoring and resource limits
Architecture Overview: System design and integration patterns

Usage Guides

For more detailed usage information, visit the Usage guide
Advanced Usage, visit the Advanced
Some questions, visit the Questions
Difference with Stata-MCP@hanlulong, visit the Difference

Key Features

Security Guard: Blocks dangerous commands (!, shell, erase, etc.)
RAM Monitoring: Prevents memory exhaustion with configurable limits
Unified Configuration: TOML config + environment variables
Cross-platform support (macOS, Windows, Linux)
Automatic log capture and error reporting

💡 Questions

🚀 Roadmap

[x] macOS support
[x] Windows support
[ ] Additional LLM integrations (With a new webUI)
[ ] Performance optimizations (Via prompt and context engineering)

For more information, refer to the Statement.

🐛 Report Issues

If you encounter any bugs or have feature requests, please open an issue.

📄 License

GNU Affero General Public License v3.0

📚 Citation

If you use Stata-MCP in your research, please cite this repository using one of the following formats:

BibTeX

@software{sepinetam2025stata,
  author = {Song Tan},
  title = {Stata-MCP: Let LLM help you achieve your regression analysis with Stata},
  year = {2025},
  url = {https://github.com/sepinetam/stata-mcp},
  version = {1.13.0}
}

APA

Song Tan. (2025). Stata-MCP: Let LLM help you achieve your regression analysis with Stata (Version 1.13.0) [Computer software]. https://github.com/sepinetam/stata-mcp

Chicago

Song Tan. 2025. "Stata-MCP: Let LLM help you achieve your regression analysis with Stata." Version 1.13.0. https://github.com/sepinetam/stata-mcp.

📬 Contact

Email: [email protected]

Or contribute directly by submitting a Pull Request! We welcome contributions of all kinds, from bug fixes to new features.

❤️ Acknowledgements

The author sincerely thanks the Stata official team for their support and the Stata License for authorizing the test development.

📃 Statement

The Stata referred to in this project is the commercial software Stata developed by StataCorp LLC. This project is not affiliated with, endorsed by, or sponsored by StataCorp LLC. This project does not include the Stata software or any installation packages; users must obtain and install a validly licensed copy of Stata from StataCorp. This project is licensed under AGPL-3.0. The project maintainers accept no liability for any loss or damage arising from the use of this project or from actions related to Stata.

More information: refer to the Chinese version at [source/docs/README/cn/README.md]; in case of any conflict, the Chinese version shall prevail.

✨ Star History

For Tasks:

Click tags to check more tools for each tasks

replicate studies validate hypotheses learn econometrics review do-files interpret results

For Jobs:

researcher data analyst economist statistician social scientist

Alternative AI tools for stata-mcp

Similar Open Source Tools

stata-mcp

github

: 81

Devon

Devon is an open-source pair programmer tool designed to facilitate collaborative coding sessions. It provides features such as multi-file editing, codebase exploration, test writing, bug fixing, and architecture exploration. The tool supports Anthropic, OpenAI, and Groq APIs, with plans to add more models in the future. Devon is community-driven, with ongoing development goals including multi-model support, plugin system for tool builders, self-hostable Electron app, and setting SOTA on SWE-bench Lite. Users can contribute to the project by developing core functionality, conducting research on agent performance, providing feedback, and testing the tool.

github

: 2.6k

LEANN

LEANN is an innovative vector database that democratizes personal AI, transforming your laptop into a powerful RAG system that can index and search through millions of documents using 97% less storage than traditional solutions without accuracy loss. It achieves this through graph-based selective recomputation and high-degree preserving pruning, computing embeddings on-demand instead of storing them all. LEANN allows semantic search of file system, emails, browser history, chat history, codebase, or external knowledge bases on your laptop with zero cloud costs and complete privacy. It is a drop-in semantic search MCP service fully compatible with Claude Code, enabling intelligent retrieval without changing your workflow.

github

: 9.9k

DesktopCommanderMCP

Desktop Commander MCP is a server that allows the Claude desktop app to execute long-running terminal commands on your computer and manage processes through Model Context Protocol (MCP). It is built on top of MCP Filesystem Server to provide additional search and replace file editing capabilities. The tool enables users to execute terminal commands with output streaming, manage processes, perform full filesystem operations, and edit code with surgical text replacements or full file rewrites. It also supports vscode-ripgrep based recursive code or text search in folders.

github

: 4.5k

better-chatbot

Better Chatbot is an open-source AI chatbot designed for individuals and teams, inspired by various AI models. It integrates major LLMs, offers powerful tools like MCP protocol and data visualization, supports automation with custom agents and visual workflows, enables collaboration by sharing configurations, provides a voice assistant feature, and ensures an intuitive user experience. The platform is built with Vercel AI SDK and Next.js, combining leading AI services into one platform for enhanced chatbot capabilities.

github

: 658

DevoxxGenieIDEAPlugin

Devoxx Genie is a Java-based IntelliJ IDEA plugin that integrates with local and cloud-based LLM providers to aid in reviewing, testing, and explaining project code. It supports features like code highlighting, chat conversations, and adding files/code snippets to context. Users can modify REST endpoints and LLM parameters in settings, including support for cloud-based LLMs. The plugin requires IntelliJ version 2023.3.4 and JDK 17. Building and publishing the plugin is done using Gradle tasks. Users can select an LLM provider, choose code, and use commands like review, explain, or generate unit tests for code analysis.

github

: 618

graphiti

Graphiti is a framework for building and querying temporally-aware knowledge graphs, tailored for AI agents in dynamic environments. It continuously integrates user interactions, structured and unstructured data, and external information into a coherent, queryable graph. The framework supports incremental data updates, efficient retrieval, and precise historical queries without complete graph recomputation, making it suitable for developing interactive, context-aware AI applications.

github

: 18.5k

mmore

MMORE is an open-source, end-to-end pipeline for ingesting, processing, indexing, and retrieving knowledge from various file types such as PDFs, Office docs, images, audio, video, and web pages. It standardizes content into a unified multimodal format, supports distributed CPU/GPU processing, and offers hybrid dense+sparse retrieval with an integrated RAG service through CLI and APIs.

github

: 134

MetaGPT

MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It assigns different roles to GPTs to form a collaborative entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories, competitive analysis, requirements, data structures, APIs, documents, etc. Internally, MetaGPT includes product managers, architects, project managers, and engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. MetaGPT's core philosophy is "Code = SOP(Team)", materializing SOP and applying it to teams composed of LLMs.

github

: 51.4k

exospherehost

Exosphere is an open source infrastructure designed to run AI agents at scale for large data and long running flows. It allows developers to define plug and playable nodes that can be run on a reliable backbone in the form of a workflow, with features like dynamic state creation at runtime, infinite parallel agents, persistent state management, and failure handling. This enables the deployment of production agents that can scale beautifully to build robust autonomous AI workflows.

github

: 65

UCAgent

UCAgent is an AI-powered automated UT verification agent for chip design. It automates chip verification workflow, supports functional and code coverage analysis, ensures consistency among documentation, code, and reports, and collaborates with mainstream Code Agents via MCP protocol. It offers three intelligent interaction modes and requires Python 3.11+, Linux/macOS OS, 4GB+ memory, and access to an AI model API. Users can clone the repository, install dependencies, configure qwen, and start verification. UCAgent supports various verification quality improvement options and basic operations through TUI shortcuts and stage color indicators. It also provides documentation build and preview using MkDocs, PDF manual build using Pandoc + XeLaTeX, and resources for further help and contribution.

github

: 94

noether

Noether is Emmi AI's open software framework for Engineering AI. It is built on transformer building blocks, delivering the full engineering stack for building, training, and operating industrial simulation models across engineering verticals. The framework eliminates the need for component re-engineering or an in-house deep learning team. Noether features a modular transformer architecture optimized for physical systems, hardware agnostic execution across CPU, MPS, and NVIDIA GPUs, industrial-grade design for high-fidelity simulations, and built-in support for Multi-GPU and SLURM cluster environments.

github

: 101

bagel

Bagel is a tool that allows users to chat with their robotics and drone data similar to using ChatGPT. It generates deterministic and auditable DuckDB SQL queries to analyze data, supporting various robotics and sensor log formats. Users can interact with Bagel through a Discord server, and it can be integrated with different language models. Bagel provides tutorials, Docker images for easy deployment, and a roadmap for upcoming features like Computer Vision Module, Anomaly Detection, and more.

github

: 335

RooFlow

RooFlow is a VS Code extension that enhances AI-assisted development by providing persistent project context and optimized mode interactions. It reduces token consumption and streamlines workflow by integrating Architect, Code, Test, Debug, and Ask modes. The tool simplifies setup, offers real-time updates, and provides clearer instructions through YAML-based rule files. It includes components like Memory Bank, System Prompts, VS Code Integration, and Real-time Updates. Users can install RooFlow by downloading specific files, placing them in the project structure, and running an insert-variables script. They can then start a chat, select a mode, interact with Roo, and use the 'Update Memory Bank' command for synchronization. The Memory Bank structure includes files for active context, decision log, product context, progress tracking, and system patterns. RooFlow features persistent context, real-time updates, mode collaboration, and reduced token consumption.

github

: 226

chunkr

Chunkr is an open-source document intelligence API that provides a production-ready service for document layout analysis, OCR, and semantic chunking. It allows users to convert PDFs, PPTs, Word docs, and images into RAG/LLM-ready chunks. The API offers features such as layout analysis, OCR with bounding boxes, structured HTML and markdown output, and VLM processing controls. Users can interact with Chunkr through a Python SDK, enabling them to upload documents, process them, and export results in various formats. The tool also supports self-hosted deployment options using Docker Compose or Kubernetes, with configurations for different AI models like OpenAI, Google AI Studio, and OpenRouter. Chunkr is dual-licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) and a commercial license, providing flexibility for different usage scenarios.

github

: 2.1k

eliza

Eliza is a versatile AI agent operating system designed to support various models and connectors, enabling users to create chatbots, autonomous agents, handle business processes, create video game NPCs, and engage in trading. It offers multi-agent and room support, document ingestion and interaction, retrievable memory and document store, and extensibility to create custom actions and clients. Eliza is easy to use and provides a comprehensive solution for AI agent development.

github

: 16.9k

For similar tasks

data-to-paper

Data-to-paper is an AI-driven framework designed to guide users through the process of conducting end-to-end scientific research, starting from raw data to the creation of comprehensive and human-verifiable research papers. The framework leverages a combination of LLM and rule-based agents to assist in tasks such as hypothesis generation, literature search, data analysis, result interpretation, and paper writing. It aims to accelerate research while maintaining key scientific values like transparency, traceability, and verifiability. The framework is field-agnostic, supports both open-goal and fixed-goal research, creates data-chained manuscripts, involves human-in-the-loop interaction, and allows for transparent replay of the research process.

github

: 553

Data-and-AI-Concepts

This repository is a curated collection of data science and AI concepts and IQs, covering topics from foundational mathematics to cutting-edge generative AI concepts. It aims to support learners and professionals preparing for various data science roles by providing detailed explanations and notebooks for each concept.

github

: 152

Awesome-LLM-Psychometrics

This repository contains a collection of tools and resources for conducting psychometric analysis in the context of latent variable modeling. It includes scripts for data preprocessing, model estimation, and results interpretation. The tools provided here aim to assist researchers and practitioners in the field of psychology and related disciplines to analyze complex relationships among latent variables using advanced statistical techniques.

github

: 51

stata-mcp

github

: 81

Curie

Curie is an AI-agent framework designed for automated and rigorous scientific experimentation. It automates end-to-end workflow management, ensures methodical procedure, reliability, and interpretability, and supports ML research, system analysis, and scientific discovery. It provides a benchmark with questions from 4 Computer Science domains. Users can customize experiment agents and adapt to their own tasks by configuring base_config.json. Curie is suitable for hyperparameter tuning, algorithm behavior analysis, system performance benchmarking, and automating computational simulations.

github

: 52

For similar jobs

lollms-webui

LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.

github

: 4.8k

Azure-Analytics-and-AI-Engagement

The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.

github

: 136

minio

MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.

github

: 46.0k

mage-ai

Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.

github

: 7.8k

AiTreasureBox

AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.

github

: 368

tidb

TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.

github

: 37.1k

airbyte

Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.

github

: 20.7k

labelbox-python

Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.

github

: 135