EverMemOS
Long-term memory OS for your agents across LLMs and platforms.
Stars: 2006
EverMemOS is an AI memory system that enables AI to not only remember past events but also understand the meaning behind memories and use them to guide decisions. It achieves 93% reasoning accuracy on the LoCoMo benchmark by providing long-term memory capabilities for conversational AI agents through structured extraction, intelligent retrieval, and progressive profile building. The tool is production-ready with support for Milvus vector DB, Elasticsearch, MongoDB, and Redis, and offers easy integration via a simple REST API. Users can store and retrieve memories using Python code and benefit from features like multi-modal memory storage, smart retrieval mechanisms, and advanced techniques for memory management.
README:
[!IMPORTANT]
Join our AI Memory Competition! Build innovative applications, plugins, or infrastructure improvements powered by EverMemOS.
Tracks:
- Agent + Memory - Build intelligent agents with long-term, evolving memories
- Platform Plugins - Integrate EverMemOS with VSCode, Chrome, Slack, Notion, LangChain, and more
- OS Infrastructure - Optimize core functionality and performance
Get Started with the Competition Starter Kit
Join our Discord to find teammates and brainstorm ideas!
Table of Contents
- Welcome to EverMemOS
- Introduction
- Star and stay tuned with us
- Why EverMemOS
- Quick Start
- API Usage
- Demo
- Evaluation
- Documentation
- GitHub Codespaces
- Questions
- Contributing
Welcome to EverMemOS! Join our community to help improve the project and collaborate with talented developers worldwide.
π¬ More than memory β it's foresight.
EverMemOS enables AI to not only remember what happened, but understand the meaning behind memories and use them to guide decisions. Achieving 93% reasoning accuracy on the LoCoMo benchmark, EverMemOS provides long-term memory capabilities for conversational AI agents through structured extraction, intelligent retrieval, and progressive profile building.
How it works: EverMemOS extracts structured memories from conversations (Encoding), organizes them into episodes and profiles (Consolidation), and intelligently retrieves relevant context when needed (Retrieval).
π Paper β’ π Vision & Overview β’ ποΈ Architecture β’ π Full Documentation
Latest: v1.2.0 with API enhancements + DB efficiency improvements (Changelog)
- π― 93% Accuracy - Best-in-class performance on LoCoMo benchmark
- π Production Ready - Enterprise-grade with Milvus vector DB, Elasticsearch, MongoDB, and Redis
- π§ Easy Integration - Simple REST API, works with any LLM
- π Multi-Modal Memory - Episodes, facts, preferences, relations
- π Smart Retrieval - BM25, embeddings, or agentic search
EverMemOS outperforms existing memory systems across all major benchmarks
- Python 3.10+ β’ Docker 20.10+ β’ uv package manager β’ 4GB RAM
Verify Prerequisites:
# Verify you have the required versions
python --version # Should be 3.10+
docker --version # Should be 20.10+
# 1. Clone and navigate
git clone https://github.com/EverMind-AI/EverMemOS.git
cd EverMemOS
# 2. Start Docker services
docker compose up -d
# 3. Install uv and dependencies
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
# 4. Configure API keys
cp env.template .env
# Edit .env and set:
# - LLM_API_KEY (for memory extraction)
# - VECTORIZE_API_KEY (for embedding/rerank)
# 5. Start server
uv run python src/run.py --port 8001
# 6. Verify installation
curl http://localhost:8001/health
# Expected response: {"status": "healthy", ...}
β
Server running at http://localhost:8001 β’ Full Setup Guide
Store and retrieve memories with simple Python code:
import requests
API_BASE = "http://localhost:8001/api/v1"
# 1. Store a conversation memory
requests.post(f"{API_BASE}/memories", json={
"message_id": "msg_001",
"create_time": "2025-02-01T10:00:00+00:00",
"sender": "user_001",
"content": "I love playing soccer on weekends"
})
# 2. Search for relevant memories
response = requests.get(f"{API_BASE}/memories/search", json={
"query": "What sports does the user like?",
"user_id": "user_001",
"memory_types": ["episodic_memory"],
"retrieve_method": "hybrid"
})
result = response.json().get("result", {})
for memory_group in result.get("memories", []):
print(f"Memory: {memory_group}")
π More Examples β’ π API Reference β’ π― Interactive Demos
# Terminal 1: Start the API server
uv run python src/run.py --port 8001
# Terminal 2: Run the simple demo
uv run python src/bootstrap.py demo/simple_demo.py
Try it now: Follow the Demo Guide for step-by-step instructions.
# Extract memories from sample data
uv run python src/bootstrap.py demo/extract_memory.py
# Start interactive chat with memory
uv run python src/bootstrap.py demo/chat_with_memory.py
See the Demo Guide for details.
- Group Chat Conversations - Combine messages from multiple speakers
- Conversation Metadata Control - Fine-grained control over conversation context
- Memory Retrieval Strategies - Lightweight vs Agentic retrieval modes
- Batch Operations - Process multiple messages efficiently
| Guide | Description |
|---|---|
| Quick Start | Installation and configuration |
| Configuration Guide | Environment variables and services |
| API Usage Guide | Endpoints and data formats |
| Development Guide | Architecture and best practices |
| Memory API | Complete API reference |
| Demo Guide | Interactive examples |
| Evaluation Guide | Benchmark testing |
EverMemOS achieves 93% overall accuracy on the LoCoMo benchmark, outperforming comparable memory systems.
- LoCoMo - Long-context memory benchmark with single/multi-hop reasoning
- LongMemEval - Multi-session conversation evaluation
- PersonaMem - Persona-based memory evaluation
# Install evaluation dependencies
uv sync --group evaluation
# Run smoke test (quick verification)
uv run python -m evaluation.cli --dataset locomo --system evermemos --smoke
# Run full evaluation
uv run python -m evaluation.cli --dataset locomo --system evermemos
# View results
cat evaluation/results/locomo-evermemos/report.txt
π Full Evaluation Guide β’ π Complete Results
EverMemOS supports GitHub Codespaces for cloud-based development. This eliminates the need to set up Docker, manage local network configurations, or worry about environment compatibility issues.
| Machine Type | Status | Notes |
|---|---|---|
| 2-core (Free tier) | β Not supported | Insufficient resources for infrastructure services |
| 4-core | β Minimum | Works but may be slow under load |
| 8-core | β Recommended | Good performance with all services |
| 16-core+ | β Optimal | Best for heavy development workloads |
Note: If your company provides GitHub Codespaces, hardware limitations typically won't be an issue since enterprise plans often include access to larger machine types.
- Click the "Open in GitHub Codespaces" button above
- Select a 4-core or larger machine when prompted
- Wait for the container to build and services to start
- Update API keys in
.env(LLM_API_KEY, VECTORIZE_API_KEY, etc.) - Run
make runto start the server
All infrastructure services (MongoDB, Elasticsearch, Milvus, Redis) start automatically and are pre-configured to work together.
EverMemOS is available on these AI-powered Q&A platforms. They can help you find answers quickly and accurately in multiple languages, covering everything from basic setup to advanced implementation details.
| Service | Link |
|---|---|
| DeepWiki |
We love open-source energy! Whether youβre squashing bugs, shipping features, sharpening docs, or just tossing in wild ideas, every PR moves EverMemOS forward. Browse Issues to find your perfect entry pointβthen show us what youβve got. Letβs build the future of memory together.
[!TIP]
Welcome all kinds of contributions π
Join us in building EverMemOS better! Every contribution makes a difference, from code to documentation. Share your projects on social media to inspire others!
Connect with one of the EverMemOS maintainers @elliotchen200 on π or @cyfyifanchen on GitHub for project updates, discussions, and collaboration opportunities.
Read our Contribution Guidelines for code standards and Git workflow.
Apache 2.0 β’ Citation β’ Acknowledgments
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for EverMemOS
Similar Open Source Tools
EverMemOS
EverMemOS is an AI memory system that enables AI to not only remember past events but also understand the meaning behind memories and use them to guide decisions. It achieves 93% reasoning accuracy on the LoCoMo benchmark by providing long-term memory capabilities for conversational AI agents through structured extraction, intelligent retrieval, and progressive profile building. The tool is production-ready with support for Milvus vector DB, Elasticsearch, MongoDB, and Redis, and offers easy integration via a simple REST API. Users can store and retrieve memories using Python code and benefit from features like multi-modal memory storage, smart retrieval mechanisms, and advanced techniques for memory management.
pro-chat
ProChat is a components library focused on quickly building large language model chat interfaces. It empowers developers to create rich, dynamic, and intuitive chat interfaces with features like automatic chat caching, streamlined conversations, message editing tools, auto-rendered Markdown, and programmatic controls. The tool also includes design evolution plans such as customized dialogue rendering, enhanced request parameters, personalized error handling, expanded documentation, and atomic component design.
agentscope
AgentScope is a multi-agent platform designed to empower developers to build multi-agent applications with large-scale models. It features three high-level capabilities: Easy-to-Use, High Robustness, and Actor-Based Distribution. AgentScope provides a list of `ModelWrapper` to support both local model services and third-party model APIs, including OpenAI API, DashScope API, Gemini API, and ollama. It also enables developers to rapidly deploy local model services using libraries such as ollama (CPU inference), Flask + Transformers, Flask + ModelScope, FastChat, and vllm. AgentScope supports various services, including Web Search, Data Query, Retrieval, Code Execution, File Operation, and Text Processing. Example applications include Conversation, Game, and Distribution. AgentScope is released under Apache License 2.0 and welcomes contributions.
monoscope
Monoscope is an open-source monitoring and observability platform that uses artificial intelligence to understand and monitor systems automatically. It allows users to ingest and explore logs, traces, and metrics in S3 buckets, query in natural language via LLMs, and create AI agents to detect anomalies. Key capabilities include universal data ingestion, AI-powered understanding, natural language interface, cost-effective storage, and zero configuration. Monoscope is designed to reduce alert fatigue, catch issues before they impact users, and provide visibility across complex systems.
optscale
OptScale is an open-source FinOps and MLOps platform that provides cloud cost optimization for all types of organizations and MLOps capabilities like experiment tracking, model versioning, ML leaderboards.
InternLM
InternLM is a powerful language model series with features such as 200K context window for long-context tasks, outstanding comprehensive performance in reasoning, math, code, chat experience, instruction following, and creative writing, code interpreter & data analysis capabilities, and stronger tool utilization capabilities. It offers models in sizes of 7B and 20B, suitable for research and complex scenarios. The models are recommended for various applications and exhibit better performance than previous generations. InternLM models may match or surpass other open-source models like ChatGPT. The tool has been evaluated on various datasets and has shown superior performance in multiple tasks. It requires Python >= 3.8, PyTorch >= 1.12.0, and Transformers >= 4.34 for usage. InternLM can be used for tasks like chat, agent applications, fine-tuning, deployment, and long-context inference.
GraphGen
GraphGen is a framework for synthetic data generation guided by knowledge graphs. It enhances supervised fine-tuning for large language models (LLMs) by generating synthetic data based on a fine-grained knowledge graph. The tool identifies knowledge gaps in LLMs, prioritizes generating QA pairs targeting high-value knowledge, incorporates multi-hop neighborhood sampling, and employs style-controlled generation to diversify QA data. Users can use LLaMA-Factory and xtuner for fine-tuning LLMs after data generation.
Q-Bench
Q-Bench is a benchmark for general-purpose foundation models on low-level vision, focusing on multi-modality LLMs performance. It includes three realms for low-level vision: perception, description, and assessment. The benchmark datasets LLVisionQA and LLDescribe are collected for perception and description tasks, with open submission-based evaluation. An abstract evaluation code is provided for assessment using public datasets. The tool can be used with the datasets API for single images and image pairs, allowing for automatic download and usage. Various tasks and evaluations are available for testing MLLMs on low-level vision tasks.
HyperChat
HyperChat is an open Chat client that utilizes various LLM APIs to enhance the Chat experience and offer productivity tools through the MCP protocol. It supports multiple LLMs like OpenAI, Claude, Qwen, Deepseek, GLM, Ollama. The platform includes a built-in MCP plugin market for easy installation and also allows manual installation of third-party MCPs. Features include Windows and MacOS support, resource support, tools support, English and Chinese language support, built-in MCP client 'hypertools', 'fetch' + 'search', Bot support, Artifacts rendering, KaTeX for mathematical formulas, WebDAV synchronization, and a MCP plugin market. Future plans include permission pop-up, scheduled tasks support, Projects + RAG support, tools implementation by LLM, and a local shell + nodejs + js on web runtime environment.
LocalAI
LocalAI is a free and open-source OpenAI alternative that acts as a drop-in replacement REST API compatible with OpenAI (Elevenlabs, Anthropic, etc.) API specifications for local AI inferencing. It allows users to run LLMs, generate images, audio, and more locally or on-premises with consumer-grade hardware, supporting multiple model families and not requiring a GPU. LocalAI offers features such as text generation with GPTs, text-to-audio, audio-to-text transcription, image generation with stable diffusion, OpenAI functions, embeddings generation for vector databases, constrained grammars, downloading models directly from Huggingface, and a Vision API. It provides a detailed step-by-step introduction in its Getting Started guide and supports community integrations such as custom containers, WebUIs, model galleries, and various bots for Discord, Slack, and Telegram. LocalAI also offers resources like an LLM fine-tuning guide, instructions for local building and Kubernetes installation, projects integrating LocalAI, and a how-tos section curated by the community. It encourages users to cite the repository when utilizing it in downstream projects and acknowledges the contributions of various software from the community.
Pallaidium
Pallaidium is a generative AI movie studio integrated into the Blender video editor. It allows users to AI-generate video, image, and audio from text prompts or existing media files. The tool provides various features such as text to video, text to audio, text to speech, text to image, image to image, image to video, video to video, image to text, and more. It requires a Windows system with a CUDA-supported Nvidia card and at least 6 GB VRAM. Pallaidium offers batch processing capabilities, text to audio conversion using Bark, and various performance optimization tips. Users can install the tool by downloading the add-on and following the installation instructions provided. The tool comes with a set of restrictions on usage, prohibiting the generation of harmful, pornographic, violent, or false content.
everything-claude-code
The 'Everything Claude Code' repository is a comprehensive collection of production-ready agents, skills, hooks, commands, rules, and MCP configurations developed over 10+ months. It includes guides for setup, foundations, and philosophy, as well as detailed explanations of various topics such as token optimization, memory persistence, continuous learning, verification loops, parallelization, and subagent orchestration. The repository also provides updates on bug fixes, multi-language rules, installation wizard, PM2 support, OpenCode plugin integration, unified commands and skills, and cross-platform support. It offers a quick start guide for installation, ecosystem tools like Skill Creator and Continuous Learning v2, requirements for CLI version compatibility, key concepts like agents, skills, hooks, and rules, running tests, contributing guidelines, OpenCode support, background information, important notes on context window management and customization, star history chart, and relevant links.
eko
Eko is a lightweight and flexible command-line tool for managing environment variables in your projects. It allows you to easily set, get, and delete environment variables for different environments, making it simple to manage configurations across development, staging, and production environments. With Eko, you can streamline your workflow and ensure consistency in your application settings without the need for complex setup or configuration files.
ASTRA.ai
ASTRA is an open-source platform designed for developing applications utilizing large language models. It merges the ideas of Backend-as-a-Service and LLM operations, allowing developers to swiftly create production-ready generative AI applications. Additionally, it empowers non-technical users to engage in defining and managing data operations for AI applications. With ASTRA, you can easily create real-time, multi-modal AI applications with low latency, even without any coding knowledge.
VideoRefer
VideoRefer Suite is a tool designed to enhance the fine-grained spatial-temporal understanding capabilities of Video Large Language Models (Video LLMs). It consists of three primary components: Model (VideoRefer) for perceiving, reasoning, and retrieval for user-defined regions at any specified timestamps, Dataset (VideoRefer-700K) for high-quality object-level video instruction data, and Benchmark (VideoRefer-Bench) to evaluate object-level video understanding capabilities. The tool can understand any object within a video.
For similar tasks
MiniAgents
MiniAgents is an open-source Python framework designed to simplify the creation of multi-agent AI systems. It offers a parallelism and async-first design, allowing users to focus on building intelligent agents while handling concurrency challenges. The framework, built on asyncio, supports LLM-based applications with immutable messages and seamless asynchronous token and message streaming between agents.
AutoAgents
AutoAgents is a cutting-edge multi-agent framework built in Rust that enables the creation of intelligent, autonomous agents powered by Large Language Models (LLMs) and Ractor. Designed for performance, safety, and scalability. AutoAgents provides a robust foundation for building complex AI systems that can reason, act, and collaborate. With AutoAgents you can create Cloud Native Agents, Edge Native Agents and Hybrid Models as well. It is so extensible that other ML Models can be used to create complex pipelines using Actor Framework.
trpc-agent-go
A powerful Go framework for building intelligent agent systems with large language models (LLMs), hierarchical planners, memory, telemetry, and a rich tool ecosystem. tRPC-Agent-Go enables the creation of autonomous or semi-autonomous agents that reason, call tools, collaborate with sub-agents, and maintain long-term state. The framework provides detailed documentation, examples, and tools for accelerating the development of AI applications.
hello-agents
Hello-Agents is a comprehensive tutorial on building intelligent agent systems, covering both theoretical foundations and practical applications. The tutorial aims to guide users in understanding and building AI-native agents, diving deep into core principles, architectures, and paradigms of intelligent agents. Users will learn to develop their own multi-agent applications from scratch, gaining hands-on experience with popular low-code platforms and agent frameworks. The tutorial also covers advanced topics such as memory systems, context engineering, communication protocols, and model training. By the end of the tutorial, users will have the skills to develop real-world projects like intelligent travel assistants and cyber towns.
EverMemOS
EverMemOS is an AI memory system that enables AI to not only remember past events but also understand the meaning behind memories and use them to guide decisions. It achieves 93% reasoning accuracy on the LoCoMo benchmark by providing long-term memory capabilities for conversational AI agents through structured extraction, intelligent retrieval, and progressive profile building. The tool is production-ready with support for Milvus vector DB, Elasticsearch, MongoDB, and Redis, and offers easy integration via a simple REST API. Users can store and retrieve memories using Python code and benefit from features like multi-modal memory storage, smart retrieval mechanisms, and advanced techniques for memory management.
llama-github
Llama-github is a powerful tool that helps retrieve relevant code snippets, issues, and repository information from GitHub based on queries. It empowers AI agents and developers to solve coding tasks efficiently. With features like intelligent GitHub retrieval, repository pool caching, LLM-powered question analysis, and comprehensive context generation, llama-github excels at providing valuable knowledge context for development needs. It supports asynchronous processing, flexible LLM integration, robust authentication options, and logging/error handling for smooth operations and troubleshooting. The vision is to seamlessly integrate with GitHub for AI-driven development solutions, while the roadmap focuses on empowering LLMs to automatically resolve complex coding tasks.
opensearch-ai
OpenSearch GPT is a personalized AI search engine that adapts to user interests while browsing the web. It utilizes advanced technologies like Mem0 for automatic memory collection, Vercel AI ADK for AI applications, Next.js for React framework, Tailwind CSS for styling, Shadcn UI for UI components, Cobe for globe animation, GPT-4o-mini for AI capabilities, and Cloudflare Pages for web application deployment. Developed by Supermemory.ai team.
chrome-extension
Mem0 Chrome Extension lets you own your memory and preferences across any Gen AI apps like ChatGPT, Claude, Perplexity, etc and get personalized, relevant responses. It allows users to store memories from conversations, retrieve relevant memories during chats, manage and organize stored information, and seamlessly integrate with the Claude AI interface. The extension requires an API key and user ID for connecting to the Mem0 API, and it stores this information locally in the browser. Users can troubleshoot common issues, and contributions to improve the extension are welcome under the MIT License.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
tabby
Tabby is a self-hosted AI coding assistant, offering an open-source and on-premises alternative to GitHub Copilot. It boasts several key features: * Self-contained, with no need for a DBMS or cloud service. * OpenAPI interface, easy to integrate with existing infrastructure (e.g Cloud IDE). * Supports consumer-grade GPUs.
spear
SPEAR (Simulator for Photorealistic Embodied AI Research) is a powerful tool for training embodied agents. It features 300 unique virtual indoor environments with 2,566 unique rooms and 17,234 unique objects that can be manipulated individually. Each environment is designed by a professional artist and features detailed geometry, photorealistic materials, and a unique floor plan and object layout. SPEAR is implemented as Unreal Engine assets and provides an OpenAI Gym interface for interacting with the environments via Python.
Magick
Magick is a groundbreaking visual AIDE (Artificial Intelligence Development Environment) for no-code data pipelines and multimodal agents. Magick can connect to other services and comes with nodes and templates well-suited for intelligent agents, chatbots, complex reasoning systems and realistic characters.

