
mcp-memory-service
Universal MCP memory service with semantic search, multi-client support, and autonomous consolidation for Claude Desktop, VS Code, and 13+ AI applications
Stars: 724

The MCP Memory Service is a universal memory service designed for AI assistants, providing semantic memory search and persistent storage. It works with various AI applications and offers fast local search using SQLite-vec and global distribution through Cloudflare. The service supports intelligent memory management, universal compatibility with AI tools, flexible storage options, and is production-ready with cross-platform support and secure connections. Users can store and recall memories, search by tags, check system health, and configure the service for Claude Desktop integration and environment variables.
README:
Universal MCP memory service with OAuth 2.1 team collaboration and semantic memory search for AI assistants. Features Claude Code HTTP transport, zero-configuration authentication, and enterprise security. Works with Claude Desktop, VS Code, Cursor, Continue, and 13+ AI applications with SQLite-vec for fast local search and Cloudflare for global distribution.
π Claude Code Team Collaboration (Zero Configuration):
# 1. Start OAuth-enabled server
export MCP_OAUTH_ENABLED=true
uv run memory server --http
# 2. Add HTTP transport to Claude Code
claude mcp add --transport http memory-service http://localhost:8000/mcp
# β
Done! Claude Code automatically handles OAuth registration and team collaboration
π Complete Setup Guide: OAuth 2.1 Setup Guide
Universal Installer (Most Compatible):
# Clone and install with automatic platform detection
git clone https://github.com/doobidoo/mcp-memory-service.git
cd mcp-memory-service
python install.py
Docker (Fastest):
# For MCP protocol (Claude Desktop)
docker-compose up -d
# For HTTP API + OAuth (Team Collaboration)
docker-compose -f docker-compose.http.yml up -d
Smithery (Claude Desktop):
# Auto-install for Claude Desktop
npx -y @smithery/cli install @doobidoo/mcp-memory-service --client claude
Updating from an older version? Scripts have been reorganized for better maintainability:
-
Recommended: Use
python -m mcp_memory_service.server
in your Claude Desktop config (no path dependencies!) -
Alternative 1: Use
uv run memory server
with UV tooling -
Alternative 2: Update path from
scripts/run_memory_server.py
toscripts/server/run_memory_server.py
- Backward compatible: Old path still works with a migration notice
On your first run, you'll see some warnings that are completely normal:
- "WARNING: Failed to load from cache: No snapshots directory" - The service is checking for cached models (first-time setup)
- "WARNING: Using TRANSFORMERS_CACHE is deprecated" - Informational warning, doesn't affect functionality
- Model download in progress - The service automatically downloads a ~25MB embedding model (takes 1-2 minutes)
These warnings disappear after the first successful run. The service is working correctly! For details, see our First-Time Setup Guide.
sqlite-vec may not have pre-built wheels for Python 3.13 yet. If installation fails:
- The installer will automatically try multiple installation methods
- Consider using Python 3.12 for the smoothest experience:
brew install [email protected]
- Alternative: Use ChromaDB backend with
--storage-backend chromadb
- See Troubleshooting Guide for details
macOS users may encounter enable_load_extension
errors with sqlite-vec:
- System Python on macOS lacks SQLite extension support by default
-
Solution: Use Homebrew Python:
brew install python && rehash
-
Alternative: Use pyenv:
PYTHON_CONFIGURE_OPTS='--enable-loadable-sqlite-extensions' pyenv install 3.12.0
-
Fallback: Use ChromaDB backend:
export MCP_MEMORY_STORAGE_BACKEND=chromadb
- See Troubleshooting Guide for details
π Visit our comprehensive Wiki for detailed guides:
- π OAuth 2.1 Setup Guide - NEW! Complete OAuth 2.1 Dynamic Client Registration guide
- π Integration Guide - Claude Desktop, Claude Code HTTP transport, VS Code, and more
- π‘οΈ Advanced Configuration - Updated! OAuth security, enterprise features
- π Installation Guide - Complete installation for all platforms and use cases
- π₯οΈ Platform Setup Guide - Windows, macOS, and Linux optimizations
- β‘ Performance Optimization - Speed up queries, optimize resources, scaling
- π¨βπ» Development Reference - Claude Code hooks, API reference, debugging
- π§ Troubleshooting Guide - Updated! OAuth troubleshooting + common issues
- β FAQ - Frequently asked questions
- π Examples - Practical code examples and workflows
- OAuth 2.1 Dynamic Client Registration - RFC 7591 & RFC 8414 compliant
- Claude Code HTTP Transport - Zero-configuration team collaboration
- JWT Authentication - Enterprise-grade security with scope validation
- Auto-Discovery Endpoints - Seamless client registration and authorization
- Multi-Auth Support - OAuth + API keys + optional anonymous access
- Semantic search with vector embeddings
- Natural language time queries ("yesterday", "last week")
- Tag-based organization with smart categorization
- Memory consolidation with dream-inspired algorithms
- Claude Desktop - Native MCP integration
- Claude Code - HTTP transport + Memory-aware development with hooks
- VS Code, Cursor, Continue - IDE extensions
- 13+ AI applications - REST API compatibility
- SQLite-vec - Fast local storage (recommended)
- ChromaDB - Multi-client collaboration
- Cloudflare - Global edge distribution
- Automatic backups and synchronization
- Cross-platform - Windows, macOS, Linux
- Service installation - Auto-start background operation
- HTTPS/SSL - Secure connections with OAuth 2.1
- Docker support - Easy deployment with team collaboration
# Start OAuth-enabled server for team collaboration
export MCP_OAUTH_ENABLED=true
uv run memory server --http
# Claude Code team members connect via HTTP transport
claude mcp add --transport http memory-service http://your-server:8000/mcp
# β Automatic OAuth discovery, registration, and authentication
# Store a memory
uv run memory store "Fixed race condition in authentication by adding mutex locks"
# Search for relevant memories
uv run memory recall "authentication race condition"
# Search by tags
uv run memory search --tags python debugging
# Check system health (shows OAuth status)
uv run memory health
Recommended approach - Add to your Claude Desktop config (~/.claude/config.json
):
{
"mcpServers": {
"memory": {
"command": "python",
"args": ["-m", "mcp_memory_service.server"],
"env": {
"MCP_MEMORY_STORAGE_BACKEND": "sqlite_vec"
}
}
}
}
Alternative approaches:
// Option 1: UV tooling (if using UV)
{
"mcpServers": {
"memory": {
"command": "uv",
"args": ["--directory", "/path/to/mcp-memory-service", "run", "memory", "server"],
"env": {
"MCP_MEMORY_STORAGE_BACKEND": "sqlite_vec"
}
}
}
}
// Option 2: Direct script path (v6.17.0+)
{
"mcpServers": {
"memory": {
"command": "python",
"args": ["/path/to/mcp-memory-service/scripts/server/run_memory_server.py"],
"env": {
"MCP_MEMORY_STORAGE_BACKEND": "sqlite_vec"
}
}
}
}
# Storage backend (sqlite_vec recommended)
export MCP_MEMORY_STORAGE_BACKEND=sqlite_vec
# Enable HTTP API
export MCP_HTTP_ENABLED=true
export MCP_HTTP_PORT=8000
# Security
export MCP_API_KEY="your-secure-key"
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β AI Clients β β MCP Memory β β Storage Backend β
β β β Service v7.0 β β β
β β’ Claude DesktopβββββΊβ β’ MCP Protocol βββββΊβ β’ SQLite-vec β
β β’ Claude Code β β β’ HTTP Transportβ β β’ ChromaDB β
β (HTTP/OAuth) β β β’ OAuth 2.1 Authβ β β’ Cloudflare β
β β’ VS Code β β β’ Memory Store β β β’ Hybrid β
β β’ Cursor β β β’ Semantic β β β
β β’ 13+ AI Apps β β Search β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
mcp-memory-service/
βββ src/mcp_memory_service/ # Core application
β βββ models/ # Data models
β βββ storage/ # Storage backends
β βββ web/ # HTTP API & dashboard
β βββ server.py # MCP server
βββ scripts/ # Utilities & installation
βββ tests/ # Test suite
βββ tools/docker/ # Docker configuration
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request
See CONTRIBUTING.md for detailed guidelines.
- π Documentation: Wiki - Comprehensive guides
- π Bug Reports: GitHub Issues
- π¬ Discussions: GitHub Discussions
- π§ Troubleshooting: Troubleshooting Guide
-
β
Configuration Validator: Run
python scripts/validation/validate_configuration_complete.py
to check your setup - π Backend Sync Tools: See scripts/README.md for CloudflareβSQLite sync
Real-world metrics from active deployments:
- 750+ memories stored and actively used across teams
- <500ms response time for semantic search (local & HTTP transport)
- 65% token reduction in Claude Code sessions with OAuth collaboration
- 96.7% faster context setup (15min β 30sec)
- 100% knowledge retention across sessions and team members
- Zero-configuration OAuth setup success rate: 98.5%
-
Verified MCP Server
-
Featured AI Tool
- Production-tested across 13+ AI applications
- Community-driven with real-world feedback and improvements
Apache License 2.0 - see LICENSE for details.
Ready to supercharge your AI workflow? π
π Start with our Installation Guide or explore the Wiki for comprehensive documentation.
Transform your AI conversations into persistent, searchable knowledge that grows with you.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for mcp-memory-service
Similar Open Source Tools

mcp-memory-service
The MCP Memory Service is a universal memory service designed for AI assistants, providing semantic memory search and persistent storage. It works with various AI applications and offers fast local search using SQLite-vec and global distribution through Cloudflare. The service supports intelligent memory management, universal compatibility with AI tools, flexible storage options, and is production-ready with cross-platform support and secure connections. Users can store and recall memories, search by tags, check system health, and configure the service for Claude Desktop integration and environment variables.

aegra
Aegra is a self-hosted AI agent backend platform that provides LangGraph power without vendor lock-in. Built with FastAPI + PostgreSQL, it offers complete control over agent orchestration for teams looking to escape vendor lock-in, meet data sovereignty requirements, enable custom deployments, and optimize costs. Aegra is Agent Protocol compliant and perfect for teams seeking a free, self-hosted alternative to LangGraph Platform with zero lock-in, full control, and compatibility with existing LangGraph Client SDK.

claude-flow
Claude-Flow is a workflow automation tool designed to streamline and optimize business processes. It provides a user-friendly interface for creating and managing workflows, allowing users to automate repetitive tasks and improve efficiency. With features such as drag-and-drop workflow builder, customizable templates, and integration with popular business tools, Claude-Flow empowers users to automate their workflows without the need for extensive coding knowledge. Whether you are a small business owner looking to streamline your operations or a project manager seeking to automate task assignments, Claude-Flow offers a flexible and scalable solution to meet your workflow automation needs.

opcode
opcode is a powerful desktop application built with Tauri 2 that serves as a command center for interacting with Claude Code. It offers a visual GUI for managing Claude Code sessions, creating custom agents, tracking usage, and more. Users can navigate projects, create specialized AI agents, monitor usage analytics, manage MCP servers, create session checkpoints, edit CLAUDE.md files, and more. The tool bridges the gap between command-line tools and visual experiences, making AI-assisted development more intuitive and productive.

evi-run
evi-run is a powerful, production-ready multi-agent AI system built on Python using the OpenAI Agents SDK. It offers instant deployment, ultimate flexibility, built-in analytics, Telegram integration, and scalable architecture. The system features memory management, knowledge integration, task scheduling, multi-agent orchestration, custom agent creation, deep research, web intelligence, document processing, image generation, DEX analytics, and Solana token swap. It supports flexible usage modes like private, free, and pay mode, with upcoming features including NSFW mode, task scheduler, and automatic limit orders. The technology stack includes Python 3.11, OpenAI Agents SDK, Telegram Bot API, PostgreSQL, Redis, and Docker & Docker Compose for deployment.

shimmy
Shimmy is a 5.1MB single-binary local inference server providing OpenAI-compatible endpoints for GGUF models. It offers fast, reliable AI inference with sub-second responses, zero configuration, and automatic port management. Perfect for developers seeking privacy, cost-effectiveness, speed, and easy integration with popular tools like VSCode and Cursor. Shimmy is designed to be invisible infrastructure that simplifies local AI development and deployment.

ToolUniverse
ToolUniverse is a collection of 211 biomedical tools designed for Agentic AI, providing access to biomedical knowledge for solving therapeutic reasoning tasks. The tools cover various aspects of drugs and diseases, linked to trusted sources like US FDA-approved drugs since 1939, Open Targets, and Monarch Initiative.

pluely
Pluely is a versatile and user-friendly tool for managing tasks and projects. It provides a simple interface for creating, organizing, and tracking tasks, making it easy to stay on top of your work. With features like task prioritization, due date reminders, and collaboration options, Pluely helps individuals and teams streamline their workflow and boost productivity. Whether you're a student juggling assignments, a professional managing multiple projects, or a team coordinating tasks, Pluely is the perfect solution to keep you organized and efficient.

ito
Ito is an intelligent voice assistant that provides seamless voice dictation to any application on your computer. It works in any app, offers global keyboard shortcuts, real-time transcription, and instant text insertion. It is smart and adaptive with features like custom dictionary, context awareness, multi-language support, and intelligent punctuation. Users can customize trigger keys, audio preferences, and privacy controls. It also offers data management features like a notes system, interaction history, cloud sync, and export capabilities. Ito is built as a modern Electron application with a multi-process architecture and utilizes technologies like React, TypeScript, Rust, gRPC, and AWS CDK.

ChordMiniApp
ChordMini is an advanced music analysis platform with AI-powered chord recognition, beat detection, and synchronized lyrics. It features a clean and intuitive interface for YouTube search, chord progression visualization, interactive guitar diagrams with accurate fingering patterns, lead sheet with AI assistant for synchronized lyrics transcription, and various add-on features like Roman Numeral Analysis, Key Modulation Signals, Simplified Chord Notation, and Enhanced Chord Correction. The tool requires Node.js, Python 3.9+, and a Firebase account for setup. It offers a hybrid backend architecture for local development and production deployments, with features like beat detection, chord recognition, lyrics processing, rate limiting, and audio processing supporting MP3, WAV, and FLAC formats. ChordMini provides a comprehensive music analysis workflow from user input to visualization, including dual input support, environment-aware processing, intelligent caching, advanced ML pipeline, and rich visualization options.

claude-007-agents
Claude Code Agents is an open-source AI agent system designed to enhance development workflows by providing specialized AI agents for orchestration, resilience engineering, and organizational memory. These agents offer specialized expertise across technologies, AI system with organizational memory, and an agent orchestration system. The system includes features such as engineering excellence by design, advanced orchestration system, Task Master integration, live MCP integrations, professional-grade workflows, and organizational intelligence. It is suitable for solo developers, small teams, enterprise teams, and open-source projects. The system requires a one-time bootstrap setup for each project to analyze the tech stack, select optimal agents, create configuration files, set up Task Master integration, and validate system readiness.

AutoAgents
AutoAgents is a cutting-edge multi-agent framework built in Rust that enables the creation of intelligent, autonomous agents powered by Large Language Models (LLMs) and Ractor. Designed for performance, safety, and scalability. AutoAgents provides a robust foundation for building complex AI systems that can reason, act, and collaborate. With AutoAgents you can create Cloud Native Agents, Edge Native Agents and Hybrid Models as well. It is so extensible that other ML Models can be used to create complex pipelines using Actor Framework.

finite-monkey-engine
FiniteMonkey is an advanced vulnerability mining engine powered purely by GPT, requiring no prior knowledge base or fine-tuning. Its effectiveness significantly surpasses most current related research approaches. The tool is task-driven, prompt-driven, and focuses on prompt design, leveraging 'deception' and hallucination as key mechanics. It has helped identify vulnerabilities worth over $60,000 in bounties. The tool requires PostgreSQL database, OpenAI API access, and Python environment for setup. It supports various languages like Solidity, Rust, Python, Move, Cairo, Tact, Func, Java, and Fake Solidity for scanning. FiniteMonkey is best suited for logic vulnerability mining in real projects, not recommended for academic vulnerability testing. GPT-4-turbo is recommended for optimal results with an average scan time of 2-3 hours for medium projects. The tool provides detailed scanning results guide and implementation tips for users.

J.A.R.V.I.S.2.0
J.A.R.V.I.S. 2.0 is an AI-powered assistant designed for voice commands, capable of tasks like providing weather reports, summarizing news, sending emails, and more. It features voice activation, speech recognition, AI responses, and handles multiple tasks including email sending, weather reports, news reading, image generation, database functions, phone call automation, AI-based task execution, website & application automation, and knowledge-based interactions. The assistant also includes timeout handling, automatic input processing, and the ability to call multiple functions simultaneously. It requires Python 3.9 or later and specific API keys for weather, news, email, and AI access. The tool integrates Gemini AI for function execution and Ollama as a fallback mechanism. It utilizes a RAG-based knowledge system and ADB integration for phone automation. Future enhancements include deeper mobile integration, advanced AI-driven automation, improved NLP-based command execution, and multi-modal interactions.

presenton
Presenton is an open-source AI presentation generator and API that allows users to create professional presentations locally on their devices. It offers complete control over the presentation workflow, including custom templates, AI template generation, flexible generation options, and export capabilities. Users can use their own API keys for various models, integrate with Ollama for local model running, and connect to OpenAI-compatible endpoints. The tool supports multiple providers for text and image generation, runs locally without cloud dependencies, and can be deployed as a Docker container with GPU support.

hugging-llm
HuggingLLM is a project that aims to introduce ChatGPT to a wider audience, particularly those interested in using the technology to create new products or applications. The project focuses on providing practical guidance on how to use ChatGPT-related APIs to create new features and applications. It also includes detailed background information and system design introductions for relevant tasks, as well as example code and implementation processes. The project is designed for individuals with some programming experience who are interested in using ChatGPT for practical applications, and it encourages users to experiment and create their own applications and demos.
For similar tasks

mcp-memory-service
The MCP Memory Service is a universal memory service designed for AI assistants, providing semantic memory search and persistent storage. It works with various AI applications and offers fast local search using SQLite-vec and global distribution through Cloudflare. The service supports intelligent memory management, universal compatibility with AI tools, flexible storage options, and is production-ready with cross-platform support and secure connections. Users can store and recall memories, search by tags, check system health, and configure the service for Claude Desktop integration and environment variables.

mem0
Mem0 is a tool that provides a smart, self-improving memory layer for Large Language Models, enabling personalized AI experiences across applications. It offers persistent memory for users, sessions, and agents, self-improving personalization, a simple API for easy integration, and cross-platform consistency. Users can store memories, retrieve memories, search for related memories, update memories, get the history of a memory, and delete memories using Mem0. It is designed to enhance AI experiences by enabling long-term memory storage and retrieval.

redcache-ai
RedCache-ai is a memory framework designed for Large Language Models and Agents. It provides a dynamic memory framework for developers to build various applications, from AI-powered dating apps to healthcare diagnostics platforms. Users can store, retrieve, search, update, and delete memories using RedCache-ai. The tool also supports integration with OpenAI for enhancing memories. RedCache-ai aims to expand its functionality by integrating with more LLM providers, adding support for AI Agents, and providing a hosted version.
For similar jobs

promptflow
**Prompt flow** is a suite of development tools designed to streamline the end-to-end development cycle of LLM-based AI applications, from ideation, prototyping, testing, evaluation to production deployment and monitoring. It makes prompt engineering much easier and enables you to build LLM apps with production quality.

deepeval
DeepEval is a simple-to-use, open-source LLM evaluation framework specialized for unit testing LLM outputs. It incorporates various metrics such as G-Eval, hallucination, answer relevancy, RAGAS, etc., and runs locally on your machine for evaluation. It provides a wide range of ready-to-use evaluation metrics, allows for creating custom metrics, integrates with any CI/CD environment, and enables benchmarking LLMs on popular benchmarks. DeepEval is designed for evaluating RAG and fine-tuning applications, helping users optimize hyperparameters, prevent prompt drifting, and transition from OpenAI to hosting their own Llama2 with confidence.

MegaDetector
MegaDetector is an AI model that identifies animals, people, and vehicles in camera trap images (which also makes it useful for eliminating blank images). This model is trained on several million images from a variety of ecosystems. MegaDetector is just one of many tools that aims to make conservation biologists more efficient with AI. If you want to learn about other ways to use AI to accelerate camera trap workflows, check out our of the field, affectionately titled "Everything I know about machine learning and camera traps".

leapfrogai
LeapfrogAI is a self-hosted AI platform designed to be deployed in air-gapped resource-constrained environments. It brings sophisticated AI solutions to these environments by hosting all the necessary components of an AI stack, including vector databases, model backends, API, and UI. LeapfrogAI's API closely matches that of OpenAI, allowing tools built for OpenAI/ChatGPT to function seamlessly with a LeapfrogAI backend. It provides several backends for various use cases, including llama-cpp-python, whisper, text-embeddings, and vllm. LeapfrogAI leverages Chainguard's apko to harden base python images, ensuring the latest supported Python versions are used by the other components of the stack. The LeapfrogAI SDK provides a standard set of protobuffs and python utilities for implementing backends and gRPC. LeapfrogAI offers UI options for common use-cases like chat, summarization, and transcription. It can be deployed and run locally via UDS and Kubernetes, built out using Zarf packages. LeapfrogAI is supported by a community of users and contributors, including Defense Unicorns, Beast Code, Chainguard, Exovera, Hypergiant, Pulze, SOSi, United States Navy, United States Air Force, and United States Space Force.

llava-docker
This Docker image for LLaVA (Large Language and Vision Assistant) provides a convenient way to run LLaVA locally or on RunPod. LLaVA is a powerful AI tool that combines natural language processing and computer vision capabilities. With this Docker image, you can easily access LLaVA's functionalities for various tasks, including image captioning, visual question answering, text summarization, and more. The image comes pre-installed with LLaVA v1.2.0, Torch 2.1.2, xformers 0.0.23.post1, and other necessary dependencies. You can customize the model used by setting the MODEL environment variable. The image also includes a Jupyter Lab environment for interactive development and exploration. Overall, this Docker image offers a comprehensive and user-friendly platform for leveraging LLaVA's capabilities.

carrot
The 'carrot' repository on GitHub provides a list of free and user-friendly ChatGPT mirror sites for easy access. The repository includes sponsored sites offering various GPT models and services. Users can find and share sites, report errors, and access stable and recommended sites for ChatGPT usage. The repository also includes a detailed list of ChatGPT sites, their features, and accessibility options, making it a valuable resource for ChatGPT users seeking free and unlimited GPT services.

TrustLLM
TrustLLM is a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. The document explains how to use the trustllm python package to help you assess the performance of your LLM in trustworthiness more quickly. For more details about TrustLLM, please refer to project website.

AI-YinMei
AI-YinMei is an AI virtual anchor Vtuber development tool (N card version). It supports fastgpt knowledge base chat dialogue, a complete set of solutions for LLM large language models: [fastgpt] + [one-api] + [Xinference], supports docking bilibili live broadcast barrage reply and entering live broadcast welcome speech, supports Microsoft edge-tts speech synthesis, supports Bert-VITS2 speech synthesis, supports GPT-SoVITS speech synthesis, supports expression control Vtuber Studio, supports painting stable-diffusion-webui output OBS live broadcast room, supports painting picture pornography public-NSFW-y-distinguish, supports search and image search service duckduckgo (requires magic Internet access), supports image search service Baidu image search (no magic Internet access), supports AI reply chat box [html plug-in], supports AI singing Auto-Convert-Music, supports playlist [html plug-in], supports dancing function, supports expression video playback, supports head touching action, supports gift smashing action, supports singing automatic start dancing function, chat and singing automatic cycle swing action, supports multi scene switching, background music switching, day and night automatic switching scene, supports open singing and painting, let AI automatically judge the content.