stata-mcp
Let LLM help you achieve your regression with Stata. Evolve from reg monkey to causal thinker.
Stars: 81
Stata-MCP is a tool designed to help users with regression analysis in Stata. It includes features such as a security guard system for validation against dangerous commands, RAM monitoring for real-time monitoring and automatic process termination, unified configuration with TOML-based config files, and complete documentation. Users can use Stata-MCP for tasks like paper replication, quick hypothesis testing, learning econometrics, code organization, and result interpretation. The tool is suitable for researchers, data analysts, economists, statisticians, and social scientists.
README:
Let LLM help you achieve your regression analysis with Stata ✨
Evolve from reg monkey to causal thinker 🐒 -> 🧐
Notes: While we strive to make open source accessible to everyone, we regret that we can no longer maintain the Apache-2.0 License. Due to individuals directly copying this project and claiming to be its maintainers, we have decided to change the license to AGPL-3.0 to prevent misuse of the project in ways that go against our original vision.
Notes: 尽管我们希望尽可能让所有人都能从开源中获益,但我们很遗憾地宣布无法继续保持 Apache-2.0 License。由于有人直接抄袭本项目并标榜其为项目维护者,我们不得不将 License 更改为 AGPL-3.0,以防止有人滥用本项目进行违背项目初心的事情。
Reason
Background: @jackdark425's repository directly copied this project and claimed to be the sole maintainer. We welcome open source collaboration based on forks, including but not limited to adding new features, fixing existing bugs, or providing valuable suggestions for the project, but we firmly oppose plagiarism and false attribution.
Update: The infringing project has been taken down via GitHub DMCA. Click here to learn about.
背景: @jackdark425 的仓库直接抄袭了本项目并标榜为项目唯一维护者。我们欢迎基于fork的开源协作,包括但不限于添加新的feature、修改已有bug或对项目提出您宝贵的意见,但坚决反对抄袭和虚假署名行为。
更新: 侵权项目已通过GitHub DMCA被takedown,点击这里查看详情。
News:
- ✨ Claude Code Plugin Support: Official plugin package with MCP server and Stata LSP integration
- ✨ Security Guard System: Automatic validation against dangerous commands (shell execution, file deletion, etc.)
- ✨ RAM Monitoring System: Real-time monitoring with automatic process termination when memory limits exceeded
- ✨ Unified Configuration: TOML-based config file with environment variable overrides
- 📚 Complete Documentation: New Configuration, Security, and Monitoring guides
- Use Stata-MCP in Claude Code, look here
- Try to use agent mode as tool? Now it is supported more easily here.
- Want to evaluate your LLM? Look here.
- Update
StataFinder, it could locate your Stata executable file automatically.
Finding our newest research? Click here or visit reports website.
Looking for others?
MCP or AI about Stata
- A session based MCP server for Stata, mcp-stata
- A VScode or Cursor integrated here. Confused it? 💡 Difference
Datasets and Informations
- STOP Dataset: StataMCP-Team Opendata Project 📊, we have open-sourced a comprehensive dataset collection for social science research, aiming to enable the future of AI-driven and data-powered research paradigms.
- Trace DID: If you want to fetch the newest information about DID (Difference-in-Difference), click here. Now there is a Chinese translation by Sepine Tam and StataMCP-Team 🎉
- Jupyter Lab Usage (Important: Stata 17+) here and nbstata
We can use Stata-MCP in Claude Code as its prefect agentic ability.
Before using it, please make sure you have ever install Claude Code, if you don't know how to install it, visit on GitHub
You can open your terminal and cd to your working directory, and run:
claude mcp add stata-mcp --env STATA_MCP_CWD=$(pwd) --scope project -- uvx --directory $(pwd) stata-mcpIn your working directory, you can find a file named .mcp.json, your mcp config will be placed here.
If you want to install Stata-MCP globally, you can run:
claude mcp add stata-mcp --scope user -- uvx stata-mcpThen, you can use Stata-MCP in Claude Code. Here are some scenarios for using it:
- Paper Replication: Replicate empirical studies from economics papers
- Quick Hypothesis Testing: Validate economic hypotheses through regression analysis
- Stata Learning Assistant: Learn econometrics with step-by-step Stata explanations
- Code Organization: Review and optimize existing Stata do-files
- Result Interpretation: Understand complex statistical outputs and regression results
We provide official native plugin, integrating Stata-MCP maintained by @sepinetam and Stata LSP maintained by @euglevi. Installation commands:
# Install stata-mcp marketplace first
claude plugin marketplace add sepinetam/stata-mcp
# Install plugin to local, project or user scope
claude plugin install stata-toolbox -s localThe details of agent mode find here.
git clone https://github.com/sepinetam/stata-mcp.git
cd stata-mcp
uv sync
uv pip install -e .
stata-mcp --version # for test whether stata-mcp is installed successfully.
stata-mcp agent run # now you can enjoy your stata-mcp agent mode.or you can directly use it with uvx:
uvx stata-mcp --version # for test whether it could be used on your computer.
uvx stata-mcp agent runYou can edit the task in agent_examples/openai/main.py for variable [model_instructions](source/agent_examples/openai/main.py#L37) and [task_message](source/agent_examples/openai/main.py#L68)
If you want to use a Stata-Agent in another agent, here is a simple example:
import asyncio
from agents import Agent, Runner
from stata_mcp.agent_as.agent_as_tool import StataAgent
# init stata agent and set as tool
stata_agent = StataAgent()
sa_tool = stata_agent.as_tool()
# Create main Agent
agent = Agent(
name="Assistant",
instructions="You are a helpful assistant",
tools=[sa_tool],
)
# Then run the agent as usual.
async def main(task: str, max_turns: int = 30):
result = await Runner.run(agent, input=task, max_turns=max_turns)
return result
if __name__ == "__main__":
econ_task = "Use Stata default data to find out the relationship between mpg and price."
asyncio.run(main(econ_task))Standard config requires: please make sure the stata is installed at the default path, and the stata cli (for macOS and Linux) exists.
The standard config json as follows, you can DIY your config via add envs.
{
"mcpServers": {
"stata-mcp": {
"command": "uvx",
"args": [
"stata-mcp"
]
}
}
}For more detailed usage information, visit the Usage guide.
And some advanced usage, visit the Advanced guide
- uv - Package installer and virtual environment manager
- Claude, Cline, ChatWise, or other LLM service
- Stata License
- Your API-KEY from LLM
Notes:
- If you are located in China, a short uv usage document you can find here.
- Claude is the best choice for Stata-MCP, for Chinese, I recommend to use DeepSeek as your model provider as it is cheap and powerful, also the score is highest in China provider, if you are increased in it, visit the report How to use StataMCP improve your social science research.
For the new version, you don't need to install the stata-mcp package again, you can just use the following command to check whether your computer can use stata-mcp.
uvx stata-mcp --usable
uvx stata-mcp --versionIf you want to use it locally, you can install it via pip or download the source code.
Download via pip
pip install stata-mcpDownload source code and compile
git clone https://github.com/sepinetam/stata-mcp.git
cd stata-mcp
uv buildThen you can find the compiled stata-mcp binary in the dist directory. You can use it directly or add it to your PATH.
For example:
uvx /path/to/your/whl/stata_mcp-1.13.0-py3-non-any.whl # here is the wheel file name, you can change it to your version- Complete Documentation: Full documentation site with all features
- Configuration Guide: Unified TOML-based configuration system
- Security Guard: Security validation for dangerous commands
- Monitoring System: RAM monitoring and resource limits
- Architecture Overview: System design and integration patterns
- For more detailed usage information, visit the Usage guide
- Advanced Usage, visit the Advanced
- Some questions, visit the Questions
- Difference with Stata-MCP@hanlulong, visit the Difference
-
Security Guard: Blocks dangerous commands (
!,shell,erase, etc.) - RAM Monitoring: Prevents memory exhaustion with configurable limits
- Unified Configuration: TOML config + environment variables
- Cross-platform support (macOS, Windows, Linux)
- Automatic log capture and error reporting
- Cherry Studio 32000 wrong
- Cherry Studio 32000 error
- Windows Support
- Network Errors When Running Stata-MCP
- [x] macOS support
- [x] Windows support
- [ ] Additional LLM integrations (With a new webUI)
- [ ] Performance optimizations (Via prompt and context engineering)
For more information, refer to the Statement.
If you encounter any bugs or have feature requests, please open an issue.
GNU Affero General Public License v3.0
If you use Stata-MCP in your research, please cite this repository using one of the following formats:
@software{sepinetam2025stata,
author = {Song Tan},
title = {Stata-MCP: Let LLM help you achieve your regression analysis with Stata},
year = {2025},
url = {https://github.com/sepinetam/stata-mcp},
version = {1.13.0}
}Song Tan. (2025). Stata-MCP: Let LLM help you achieve your regression analysis with Stata (Version 1.13.0) [Computer software]. https://github.com/sepinetam/stata-mcp
Song Tan. 2025. "Stata-MCP: Let LLM help you achieve your regression analysis with Stata." Version 1.13.0. https://github.com/sepinetam/stata-mcp.
Email: [email protected]
Or contribute directly by submitting a Pull Request! We welcome contributions of all kinds, from bug fixes to new features.
The author sincerely thanks the Stata official team for their support and the Stata License for authorizing the test development.
The Stata referred to in this project is the commercial software Stata developed by StataCorp LLC. This project is not affiliated with, endorsed by, or sponsored by StataCorp LLC. This project does not include the Stata software or any installation packages; users must obtain and install a validly licensed copy of Stata from StataCorp. This project is licensed under AGPL-3.0. The project maintainers accept no liability for any loss or damage arising from the use of this project or from actions related to Stata.
More information: refer to the Chinese version at [source/docs/README/cn/README.md]; in case of any conflict, the Chinese version shall prevail.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for stata-mcp
Similar Open Source Tools
stata-mcp
Stata-MCP is a tool designed to help users with regression analysis in Stata. It includes features such as a security guard system for validation against dangerous commands, RAM monitoring for real-time monitoring and automatic process termination, unified configuration with TOML-based config files, and complete documentation. Users can use Stata-MCP for tasks like paper replication, quick hypothesis testing, learning econometrics, code organization, and result interpretation. The tool is suitable for researchers, data analysts, economists, statisticians, and social scientists.
Devon
Devon is an open-source pair programmer tool designed to facilitate collaborative coding sessions. It provides features such as multi-file editing, codebase exploration, test writing, bug fixing, and architecture exploration. The tool supports Anthropic, OpenAI, and Groq APIs, with plans to add more models in the future. Devon is community-driven, with ongoing development goals including multi-model support, plugin system for tool builders, self-hostable Electron app, and setting SOTA on SWE-bench Lite. Users can contribute to the project by developing core functionality, conducting research on agent performance, providing feedback, and testing the tool.
LEANN
LEANN is an innovative vector database that democratizes personal AI, transforming your laptop into a powerful RAG system that can index and search through millions of documents using 97% less storage than traditional solutions without accuracy loss. It achieves this through graph-based selective recomputation and high-degree preserving pruning, computing embeddings on-demand instead of storing them all. LEANN allows semantic search of file system, emails, browser history, chat history, codebase, or external knowledge bases on your laptop with zero cloud costs and complete privacy. It is a drop-in semantic search MCP service fully compatible with Claude Code, enabling intelligent retrieval without changing your workflow.
DesktopCommanderMCP
Desktop Commander MCP is a server that allows the Claude desktop app to execute long-running terminal commands on your computer and manage processes through Model Context Protocol (MCP). It is built on top of MCP Filesystem Server to provide additional search and replace file editing capabilities. The tool enables users to execute terminal commands with output streaming, manage processes, perform full filesystem operations, and edit code with surgical text replacements or full file rewrites. It also supports vscode-ripgrep based recursive code or text search in folders.
better-chatbot
Better Chatbot is an open-source AI chatbot designed for individuals and teams, inspired by various AI models. It integrates major LLMs, offers powerful tools like MCP protocol and data visualization, supports automation with custom agents and visual workflows, enables collaboration by sharing configurations, provides a voice assistant feature, and ensures an intuitive user experience. The platform is built with Vercel AI SDK and Next.js, combining leading AI services into one platform for enhanced chatbot capabilities.
DevoxxGenieIDEAPlugin
Devoxx Genie is a Java-based IntelliJ IDEA plugin that integrates with local and cloud-based LLM providers to aid in reviewing, testing, and explaining project code. It supports features like code highlighting, chat conversations, and adding files/code snippets to context. Users can modify REST endpoints and LLM parameters in settings, including support for cloud-based LLMs. The plugin requires IntelliJ version 2023.3.4 and JDK 17. Building and publishing the plugin is done using Gradle tasks. Users can select an LLM provider, choose code, and use commands like review, explain, or generate unit tests for code analysis.
graphiti
Graphiti is a framework for building and querying temporally-aware knowledge graphs, tailored for AI agents in dynamic environments. It continuously integrates user interactions, structured and unstructured data, and external information into a coherent, queryable graph. The framework supports incremental data updates, efficient retrieval, and precise historical queries without complete graph recomputation, making it suitable for developing interactive, context-aware AI applications.
mmore
MMORE is an open-source, end-to-end pipeline for ingesting, processing, indexing, and retrieving knowledge from various file types such as PDFs, Office docs, images, audio, video, and web pages. It standardizes content into a unified multimodal format, supports distributed CPU/GPU processing, and offers hybrid dense+sparse retrieval with an integrated RAG service through CLI and APIs.
MetaGPT
MetaGPT is a multi-agent framework that enables GPT to work in a software company, collaborating to tackle more complex tasks. It assigns different roles to GPTs to form a collaborative entity for complex tasks. MetaGPT takes a one-line requirement as input and outputs user stories, competitive analysis, requirements, data structures, APIs, documents, etc. Internally, MetaGPT includes product managers, architects, project managers, and engineers. It provides the entire process of a software company along with carefully orchestrated SOPs. MetaGPT's core philosophy is "Code = SOP(Team)", materializing SOP and applying it to teams composed of LLMs.
exospherehost
Exosphere is an open source infrastructure designed to run AI agents at scale for large data and long running flows. It allows developers to define plug and playable nodes that can be run on a reliable backbone in the form of a workflow, with features like dynamic state creation at runtime, infinite parallel agents, persistent state management, and failure handling. This enables the deployment of production agents that can scale beautifully to build robust autonomous AI workflows.
UCAgent
UCAgent is an AI-powered automated UT verification agent for chip design. It automates chip verification workflow, supports functional and code coverage analysis, ensures consistency among documentation, code, and reports, and collaborates with mainstream Code Agents via MCP protocol. It offers three intelligent interaction modes and requires Python 3.11+, Linux/macOS OS, 4GB+ memory, and access to an AI model API. Users can clone the repository, install dependencies, configure qwen, and start verification. UCAgent supports various verification quality improvement options and basic operations through TUI shortcuts and stage color indicators. It also provides documentation build and preview using MkDocs, PDF manual build using Pandoc + XeLaTeX, and resources for further help and contribution.
noether
Noether is Emmi AI's open software framework for Engineering AI. It is built on transformer building blocks, delivering the full engineering stack for building, training, and operating industrial simulation models across engineering verticals. The framework eliminates the need for component re-engineering or an in-house deep learning team. Noether features a modular transformer architecture optimized for physical systems, hardware agnostic execution across CPU, MPS, and NVIDIA GPUs, industrial-grade design for high-fidelity simulations, and built-in support for Multi-GPU and SLURM cluster environments.
bagel
Bagel is a tool that allows users to chat with their robotics and drone data similar to using ChatGPT. It generates deterministic and auditable DuckDB SQL queries to analyze data, supporting various robotics and sensor log formats. Users can interact with Bagel through a Discord server, and it can be integrated with different language models. Bagel provides tutorials, Docker images for easy deployment, and a roadmap for upcoming features like Computer Vision Module, Anomaly Detection, and more.
RooFlow
RooFlow is a VS Code extension that enhances AI-assisted development by providing persistent project context and optimized mode interactions. It reduces token consumption and streamlines workflow by integrating Architect, Code, Test, Debug, and Ask modes. The tool simplifies setup, offers real-time updates, and provides clearer instructions through YAML-based rule files. It includes components like Memory Bank, System Prompts, VS Code Integration, and Real-time Updates. Users can install RooFlow by downloading specific files, placing them in the project structure, and running an insert-variables script. They can then start a chat, select a mode, interact with Roo, and use the 'Update Memory Bank' command for synchronization. The Memory Bank structure includes files for active context, decision log, product context, progress tracking, and system patterns. RooFlow features persistent context, real-time updates, mode collaboration, and reduced token consumption.
chunkr
Chunkr is an open-source document intelligence API that provides a production-ready service for document layout analysis, OCR, and semantic chunking. It allows users to convert PDFs, PPTs, Word docs, and images into RAG/LLM-ready chunks. The API offers features such as layout analysis, OCR with bounding boxes, structured HTML and markdown output, and VLM processing controls. Users can interact with Chunkr through a Python SDK, enabling them to upload documents, process them, and export results in various formats. The tool also supports self-hosted deployment options using Docker Compose or Kubernetes, with configurations for different AI models like OpenAI, Google AI Studio, and OpenRouter. Chunkr is dual-licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) and a commercial license, providing flexibility for different usage scenarios.
eliza
Eliza is a versatile AI agent operating system designed to support various models and connectors, enabling users to create chatbots, autonomous agents, handle business processes, create video game NPCs, and engage in trading. It offers multi-agent and room support, document ingestion and interaction, retrievable memory and document store, and extensibility to create custom actions and clients. Eliza is easy to use and provides a comprehensive solution for AI agent development.
For similar tasks
data-to-paper
Data-to-paper is an AI-driven framework designed to guide users through the process of conducting end-to-end scientific research, starting from raw data to the creation of comprehensive and human-verifiable research papers. The framework leverages a combination of LLM and rule-based agents to assist in tasks such as hypothesis generation, literature search, data analysis, result interpretation, and paper writing. It aims to accelerate research while maintaining key scientific values like transparency, traceability, and verifiability. The framework is field-agnostic, supports both open-goal and fixed-goal research, creates data-chained manuscripts, involves human-in-the-loop interaction, and allows for transparent replay of the research process.
Data-and-AI-Concepts
This repository is a curated collection of data science and AI concepts and IQs, covering topics from foundational mathematics to cutting-edge generative AI concepts. It aims to support learners and professionals preparing for various data science roles by providing detailed explanations and notebooks for each concept.
Awesome-LLM-Psychometrics
This repository contains a collection of tools and resources for conducting psychometric analysis in the context of latent variable modeling. It includes scripts for data preprocessing, model estimation, and results interpretation. The tools provided here aim to assist researchers and practitioners in the field of psychology and related disciplines to analyze complex relationships among latent variables using advanced statistical techniques.
stata-mcp
Stata-MCP is a tool designed to help users with regression analysis in Stata. It includes features such as a security guard system for validation against dangerous commands, RAM monitoring for real-time monitoring and automatic process termination, unified configuration with TOML-based config files, and complete documentation. Users can use Stata-MCP for tasks like paper replication, quick hypothesis testing, learning econometrics, code organization, and result interpretation. The tool is suitable for researchers, data analysts, economists, statisticians, and social scientists.
Curie
Curie is an AI-agent framework designed for automated and rigorous scientific experimentation. It automates end-to-end workflow management, ensures methodical procedure, reliability, and interpretability, and supports ML research, system analysis, and scientific discovery. It provides a benchmark with questions from 4 Computer Science domains. Users can customize experiment agents and adapt to their own tasks by configuring base_config.json. Curie is suitable for hyperparameter tuning, algorithm behavior analysis, system performance benchmarking, and automating computational simulations.
For similar jobs
lollms-webui
LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.
minio
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.
mage-ai
Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.
AiTreasureBox
AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.
tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.
airbyte
Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.