agent-memory-server
Fast and flexible memory for agents and AI applications using Redis
Stars: 188
The agent-memory-server is a memory layer designed for AI agents, providing a dual interface with REST API and Model Context Protocol server. It offers two-tier memory management, configurable memory strategies, semantic search capabilities, and flexible backends. The tool supports multi-provider LLM integration and AI features like topic extraction, entity recognition, and conversation summarization. It includes a Python SDK for easy integration with AI applications, allowing users to store and search memories efficiently. The server is suitable for AI assistants, customer support, personal AI, research assistants, and chatbots, enabling persistent memory across conversations and context from previous interactions.
README:
- Dual Interface: REST API and Model Context Protocol (MCP) server
- Two-Tier Memory: Working memory (session-scoped) and long-term memory (persistent)
- Configurable Memory Strategies: Customize how memories are extracted (discrete, summary, preferences, custom)
- Semantic Search: Vector-based similarity search with metadata filtering
- Flexible Backends: Pluggable vector store factory system
- Multi-Provider LLM Support: OpenAI, Anthropic, AWS Bedrock, Ollama, Azure, Gemini via LiteLLM
- AI Integration: Automatic topic extraction, entity recognition, and conversation summarization
- Python SDK: Easy integration with AI applications
Pre-built Docker images are available from:
- Docker Hub: redislabs/agent-memory-server
- GitHub Packages: ghcr.io/redis/agent-memory-server
Quick Start (Development Mode):
# Start with docker-compose
# Note: Both 'api' and 'api-for-task-worker' services use port 8000
# Choose one depending on your needs:
# Option 1: Development mode (no worker, immediate task execution)
docker compose up api redis
# Option 2: Production-like mode (with background worker)
docker compose up api-for-task-worker task-worker redis mcp
# Or run just the API server (requires separate Redis)
docker run -p 8000:8000 \
-e REDIS_URL=redis://your-redis:6379 \
-e OPENAI_API_KEY=your-key \
redislabs/agent-memory-server:latest \
agent-memory api --host 0.0.0.0 --port 8000 --task-backend=asyncioBy default, the image runs the API with the Docket task backend, which
expects a separate agent-memory task-worker process for non-blocking
background tasks. The example above shows how to override this to use the
asyncio backend for a single-container development setup.
Production Deployment:
For production, run separate containers for the API and background workers:
# API Server (without background worker)
docker run -p 8000:8000 \
-e REDIS_URL=redis://your-redis:6379 \
-e OPENAI_API_KEY=your-key \
-e DISABLE_AUTH=false \
redislabs/agent-memory-server:latest \
agent-memory api --host 0.0.0.0 --port 8000
# Background Worker (separate container)
docker run \
-e REDIS_URL=redis://your-redis:6379 \
-e OPENAI_API_KEY=your-key \
redislabs/agent-memory-server:latest \
agent-memory task-worker --concurrency 10
# MCP Server (if needed)
docker run -p 9000:9000 \
-e REDIS_URL=redis://your-redis:6379 \
-e OPENAI_API_KEY=your-key \
redislabs/agent-memory-server:latest \
agent-memory mcp --mode sse --port 9000# Install dependencies
pip install uv
uv install --all-extras
# Start Redis
docker-compose up redis
# Start the server (development mode, asyncio task backend)
uv run agent-memory api --task-backend=asyncioAllowing the server to extract memories from working memory is easiest. However, you can also manually create memories:
# Install the client
pip install agent-memory-client
# For LangChain integration
pip install agent-memory-client langchain-corefrom agent_memory_client import MemoryAPIClient
# Connect to server
client = MemoryAPIClient(base_url="http://localhost:8000")
# Store memories
await client.create_long_term_memories([
{
"text": "User prefers morning meetings",
"user_id": "user123",
"memory_type": "preference"
}
])
# Search memories
results = await client.search_long_term_memory(
text="What time does the user like meetings?",
user_id="user123"
)Note: While you can call client functions directly as shown above, using MCP or SDK-provided tool calls is recommended for AI agents as it provides better integration, automatic context management, and follows AI-native patterns. For the best performance, you can add messages to working memory and allow the server to extract memories in the background. See Memory Integration Patterns for guidance on when to use each approach.
For LangChain users, the SDK provides automatic conversion of memory client tools to LangChain-compatible tools, eliminating the need for manual wrapping with @tool decorators.
from agent_memory_client import create_memory_client
from agent_memory_client.integrations.langchain import get_memory_tools
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI
# Get LangChain-compatible tools automatically
memory_client = await create_memory_client("http://localhost:8000")
tools = get_memory_tools(
memory_client=memory_client,
session_id="my_session",
user_id="alice"
)
# Create prompt and agent
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant with memory."),
("human", "{input}"),
MessagesPlaceholder("agent_scratchpad"),
])
llm = ChatOpenAI(model="gpt-4o")
agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
# Use the agent
result = await executor.ainvoke({"input": "Remember that I love pizza"})# Start MCP server (stdio mode - recommended for Claude Desktop)
uv run agent-memory mcp
# Or with SSE mode (development mode, default asyncio backend)
uv run agent-memory mcp --mode sse --port 9000Use this in your MCP tool configuration (e.g., Claude Desktop mcp.json):
{
"mcpServers": {
"memory": {
"command": "uvx",
"args": ["--from", "agent-memory-server", "agent-memory", "mcp"],
"env": {
"DISABLE_AUTH": "true",
"REDIS_URL": "redis://localhost:6379",
"OPENAI_API_KEY": "<your-openai-key>"
}
}
}
}Notes:
-
API keys: Set either
OPENAI_API_KEY(default models use OpenAI) or switch to Anthropic by settingANTHROPIC_API_KEYandGENERATION_MODELto an Anthropic model (e.g.,claude-3-5-haiku-20241022). -
Make sure your MCP host can find
uvx(on its PATH or by using an absolute command path).- macOS:
brew install uv - If not on PATH, set
"command"to the absolute path (e.g.,/opt/homebrew/bin/uvxon Apple Silicon,/usr/local/bin/uvxon Intel macOS). On Linux,~/.local/bin/uvxis common. See https://docs.astral.sh/uv/getting-started/
- macOS:
-
For production, remove
DISABLE_AUTHand configure proper authentication.
The server uses LiteLLM to support 100+ LLM providers. Configure via environment variables:
# OpenAI (default)
export OPENAI_API_KEY=sk-...
export GENERATION_MODEL=gpt-4o
export EMBEDDING_MODEL=text-embedding-3-small
# Anthropic
export ANTHROPIC_API_KEY=sk-ant-...
export GENERATION_MODEL=claude-3-5-sonnet-20241022
export EMBEDDING_MODEL=text-embedding-3-small # Use OpenAI for embeddings
# AWS Bedrock
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_REGION_NAME=us-east-1
export GENERATION_MODEL=anthropic.claude-sonnet-4-5-20250929-v1:0
export EMBEDDING_MODEL=bedrock/amazon.titan-embed-text-v2:0 # Note: bedrock/ prefix required
# Ollama (local)
export OLLAMA_API_BASE=http://localhost:11434
export GENERATION_MODEL=ollama/llama2
export EMBEDDING_MODEL=ollama/nomic-embed-text
export REDISVL_VECTOR_DIMENSIONS=768 # Required for OllamaSee LLM Providers for complete configuration options.
📚 Full Documentation - Complete guides, API reference, and examples
- Quick Start Guide - Get up and running in minutes
- Python SDK - Complete SDK reference with examples
- LangChain Integration - Automatic tool conversion for LangChain
- LLM Providers - Configure OpenAI, Anthropic, AWS Bedrock, Ollama, and more
- Embedding Providers - Configure embedding models for semantic search
- Vector Store Backends - Configure different vector databases
- Authentication - OAuth2/JWT setup for production
- Memory Types - Understanding semantic vs episodic memory
- API Reference - REST API endpoints
- MCP Protocol - Model Context Protocol integration
Working Memory (Session-scoped) → Long-term Memory (Persistent)
↓ ↓
- Messages - Semantic search
- Structured memories - Topic modeling
- Summary of past messages - Entity recognition
- Metadata - Deduplication
- AI Assistants: Persistent memory across conversations
- Customer Support: Context from previous interactions
- Personal AI: Learning user preferences and history
- Research Assistants: Accumulating knowledge over time
- Chatbots: Maintaining context and personalization
# Install dependencies
uv install --all-extras
# Run tests
uv run pytest
# Format code
uv run ruff format
uv run ruff check
# Start development stack (choose one based on your needs)
docker compose up api redis # Development mode
docker compose up api-for-task-worker task-worker redis # Production-like modeApache License 2.0 - see LICENSE file for details.
We welcome contributions! Please see the development documentation for guidelines.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for agent-memory-server
Similar Open Source Tools
agent-memory-server
The agent-memory-server is a memory layer designed for AI agents, providing a dual interface with REST API and Model Context Protocol server. It offers two-tier memory management, configurable memory strategies, semantic search capabilities, and flexible backends. The tool supports multi-provider LLM integration and AI features like topic extraction, entity recognition, and conversation summarization. It includes a Python SDK for easy integration with AI applications, allowing users to store and search memories efficiently. The server is suitable for AI assistants, customer support, personal AI, research assistants, and chatbots, enabling persistent memory across conversations and context from previous interactions.
inference-gateway
The Inference Gateway is an open-source proxy server designed to simplify access to various language model APIs. It allows users to interact with different language models through a unified interface, stream tokens in real-time, process images alongside text, and use Docker or Kubernetes for deployment. The gateway supports Model Context Protocol integration, provides metrics and observability features, and is production-ready with minimal resource consumption. It offers middleware control and bypass mechanisms, enabling users to manage capabilities like MCP and vision support. The CLI tool provides status monitoring, interactive chat, configuration management, project initialization, and tool execution functionalities. The project aims to provide a flexible solution for AI Agents, supporting self-hosted LLMs and avoiding vendor lock-in.
CyberStrikeAI
CyberStrikeAI is an AI-native security testing platform built in Go that integrates 100+ security tools, an intelligent orchestration engine, role-based testing with predefined security roles, a skills system with specialized testing skills, and comprehensive lifecycle management capabilities. It enables end-to-end automation from conversational commands to vulnerability discovery, attack-chain analysis, knowledge retrieval, and result visualization, delivering an auditable, traceable, and collaborative testing environment for security teams. The platform features an AI decision engine with OpenAI-compatible models, native MCP implementation with various transports, prebuilt tool recipes, large-result pagination, attack-chain graph, password-protected web UI, knowledge base with vector search, vulnerability management, batch task management, role-based testing, and skills system.
MassGen
MassGen is a cutting-edge multi-agent system that leverages the power of collaborative AI to solve complex tasks. It assigns a task to multiple AI agents who work in parallel, observe each other's progress, and refine their approaches to converge on the best solution to deliver a comprehensive and high-quality result. The system operates through an architecture designed for seamless multi-agent collaboration, with key features including cross-model/agent synergy, parallel processing, intelligence sharing, consensus building, and live visualization. Users can install the system, configure API settings, and run MassGen for various tasks such as question answering, creative writing, research, development & coding tasks, and web automation & browser tasks. The roadmap includes plans for advanced agent collaboration, expanded model, tool & agent integration, improved performance & scalability, enhanced developer experience, and a web interface.
pentagi
PentAGI is an innovative tool for automated security testing that leverages cutting-edge artificial intelligence technologies. It is designed for information security professionals, researchers, and enthusiasts who need a powerful and flexible solution for conducting penetration tests. The tool provides secure and isolated operations in a sandboxed Docker environment, fully autonomous AI-powered agent for penetration testing steps, a suite of 20+ professional security tools, smart memory system for storing research results, web intelligence for gathering information, integration with external search systems, team delegation system, comprehensive monitoring and reporting, modern interface, API integration, persistent storage, scalable architecture, self-hosted solution, flexible authentication, and quick deployment through Docker Compose.
pgedge-postgres-mcp
The pgedge-postgres-mcp repository contains a set of tools and scripts for managing and monitoring PostgreSQL databases in an edge computing environment. It provides functionalities for automating database tasks, monitoring database performance, and ensuring data integrity in edge computing scenarios. The tools are designed to be lightweight and efficient, making them suitable for resource-constrained edge devices. With pgedge-postgres-mcp, users can easily deploy and manage PostgreSQL databases in edge computing environments with minimal overhead.
R2R
R2R (RAG to Riches) is a fast and efficient framework for serving high-quality Retrieval-Augmented Generation (RAG) to end users. The framework is designed with customizable pipelines and a feature-rich FastAPI implementation, enabling developers to quickly deploy and scale RAG-based applications. R2R was conceived to bridge the gap between local LLM experimentation and scalable production solutions. **R2R is to LangChain/LlamaIndex what NextJS is to React**. A JavaScript client for R2R deployments can be found here. ### Key Features * **🚀 Deploy** : Instantly launch production-ready RAG pipelines with streaming capabilities. * **🧩 Customize** : Tailor your pipeline with intuitive configuration files. * **🔌 Extend** : Enhance your pipeline with custom code integrations. * **⚖️ Autoscale** : Scale your pipeline effortlessly in the cloud using SciPhi. * **🤖 OSS** : Benefit from a framework developed by the open-source community, designed to simplify RAG deployment.
codemie-code
Unified AI Coding Assistant CLI for managing multiple AI agents like Claude Code, Google Gemini, OpenCode, and custom AI agents. Supports OpenAI, Azure OpenAI, AWS Bedrock, LiteLLM, Ollama, and Enterprise SSO. Features built-in LangGraph agent with file operations, command execution, and planning tools. Cross-platform support for Windows, Linux, and macOS. Ideal for developers seeking a powerful alternative to GitHub Copilot or Cursor.
sdk-typescript
Strands Agents - TypeScript SDK is a lightweight and flexible SDK that takes a model-driven approach to building and running AI agents in TypeScript/JavaScript. It brings key features from the Python Strands framework to Node.js environments, enabling type-safe agent development for various applications. The SDK supports model agnostic development with first-class support for Amazon Bedrock and OpenAI, along with extensible architecture for custom providers. It also offers built-in MCP support, real-time response streaming, extensible hooks, and conversation management features. With tools for interaction with external systems and seamless integration with MCP servers, the SDK provides a comprehensive solution for developing AI agents.
tingly-box
Tingly Box is a tool that helps in deciding which model to call, compressing context, and routing requests efficiently. It offers secure, reliable, and customizable functional extensions. With features like unified API, smart routing, context compression, auto API translation, blazing fast performance, flexible authentication, visual control panel, and client-side usage stats, Tingly Box provides a comprehensive solution for managing AI models and tokens. It supports integration with various IDEs, CLI tools, SDKs, and AI applications, making it versatile and easy to use. The tool also allows seamless integration with OAuth providers like Claude Code, enabling users to utilize existing quotas in OpenAI-compatible tools. Tingly Box aims to simplify AI model management and usage by providing a single endpoint for multiple providers with minimal configuration, promoting seamless integration with SDKs and CLI tools.
mistral.rs
Mistral.rs is a fast LLM inference platform written in Rust. We support inference on a variety of devices, quantization, and easy-to-use application with an Open-AI API compatible HTTP server and Python bindings.
LEANN
LEANN is an innovative vector database that democratizes personal AI, transforming your laptop into a powerful RAG system that can index and search through millions of documents using 97% less storage than traditional solutions without accuracy loss. It achieves this through graph-based selective recomputation and high-degree preserving pruning, computing embeddings on-demand instead of storing them all. LEANN allows semantic search of file system, emails, browser history, chat history, codebase, or external knowledge bases on your laptop with zero cloud costs and complete privacy. It is a drop-in semantic search MCP service fully compatible with Claude Code, enabling intelligent retrieval without changing your workflow.
code_puppy
Code Puppy is an AI-powered code generation agent designed to understand programming tasks, generate high-quality code, and explain its reasoning. It supports multi-language code generation, interactive CLI, and detailed code explanations. The tool requires Python 3.9+ and API keys for various models like GPT, Google's Gemini, Cerebras, and Claude. It also integrates with MCP servers for advanced features like code search and documentation lookups. Users can create custom JSON agents for specialized tasks and access a variety of tools for file management, code execution, and reasoning sharing.
nanocoder
Nanocoder is a versatile code editor designed for beginners and experienced programmers alike. It provides a user-friendly interface with features such as syntax highlighting, code completion, and error checking. With Nanocoder, you can easily write and debug code in various programming languages, making it an ideal tool for learning, practicing, and developing software projects. Whether you are a student, hobbyist, or professional developer, Nanocoder offers a seamless coding experience to boost your productivity and creativity.
Free-GPT4-WEB-API
FreeGPT4-WEB-API is a Python server that allows you to have a self-hosted GPT-4 Unlimited and Free WEB API, via the latest Bing's AI. It uses Flask and GPT4Free libraries. GPT4Free provides an interface to the Bing's GPT-4. The server can be configured by editing the `FreeGPT4_Server.py` file. You can change the server's port, host, and other settings. The only cookie needed for the Bing model is `_U`.
nosia
Nosia is a self-hosted AI RAG + MCP platform that allows users to run AI models on their own data with complete privacy and control. It integrates the Model Context Protocol (MCP) to connect AI models with external tools, services, and data sources. The platform is designed to be easy to install and use, providing OpenAI-compatible APIs that work seamlessly with existing AI applications. Users can augment AI responses with their documents, perform real-time streaming, support multi-format data, enable semantic search, and achieve easy deployment with Docker Compose. Nosia also offers multi-tenancy for secure data separation.
For similar tasks
agent-memory-server
The agent-memory-server is a memory layer designed for AI agents, providing a dual interface with REST API and Model Context Protocol server. It offers two-tier memory management, configurable memory strategies, semantic search capabilities, and flexible backends. The tool supports multi-provider LLM integration and AI features like topic extraction, entity recognition, and conversation summarization. It includes a Python SDK for easy integration with AI applications, allowing users to store and search memories efficiently. The server is suitable for AI assistants, customer support, personal AI, research assistants, and chatbots, enabling persistent memory across conversations and context from previous interactions.
A-mem
A-MEM is a novel agentic memory system designed for Large Language Model (LLM) agents to dynamically organize memories in an agentic way. It introduces advanced memory organization capabilities, intelligent indexing, and linking of memories, comprehensive note generation, interconnected knowledge networks, continuous memory evolution, and agent-driven decision making for adaptive memory management. The system facilitates agent construction and enables dynamic memory operations and flexible agent-memory interactions.
claude-memory
Claude Memory is a Chrome extension that enhances interactions with Claude by storing and retrieving important information from conversations, making interactions personalized and context-aware. It allows users to easily manage and organize stored information, with seamless integration with the Claude AI interface.
chrome-extension
Mem0 Chrome Extension lets you own your memory and preferences across any Gen AI apps like ChatGPT, Claude, Perplexity, etc and get personalized, relevant responses. It allows users to store memories from conversations, retrieve relevant memories during chats, manage and organize stored information, and seamlessly integrate with the Claude AI interface. The extension requires an API key and user ID for connecting to the Mem0 API, and it stores this information locally in the browser. Users can troubleshoot common issues, and contributions to improve the extension are welcome under the MIT License.
EverMemOS
EverMemOS is an AI memory system that enables AI to not only remember past events but also understand the meaning behind memories and use them to guide decisions. It achieves 93% reasoning accuracy on the LoCoMo benchmark by providing long-term memory capabilities for conversational AI agents through structured extraction, intelligent retrieval, and progressive profile building. The tool is production-ready with support for Milvus vector DB, Elasticsearch, MongoDB, and Redis, and offers easy integration via a simple REST API. Users can store and retrieve memories using Python code and benefit from features like multi-modal memory storage, smart retrieval mechanisms, and advanced techniques for memory management.
shodh-memory
Shodh-Memory is a cognitive memory system designed for AI agents to persist memory across sessions, learn from experience, and run entirely offline. It features Hebbian learning, activation decay, and semantic consolidation, packed into a single ~17MB binary. Users can deploy it on cloud, edge devices, or air-gapped systems to enhance the memory capabilities of AI agents.
automem
AutoMem is a production-grade long-term memory system for AI assistants, achieving 90.53% accuracy on the LoCoMo benchmark. It combines FalkorDB (Graph) and Qdrant (Vectors) storage systems to store, recall, connect, learn, and perform with memories. AutoMem enables AI assistants to remember, connect, and evolve their understanding over time, similar to human long-term memory. It implements techniques from peer-reviewed memory research and offers features like multi-hop bridge discovery, knowledge graphs that evolve, 9-component hybrid scoring, memory consolidation cycles, background intelligence, 11 relationship types, and more. AutoMem is benchmark-proven, research-validated, and production-ready, with features like sub-100ms recall, concurrent writes, automatic retries, health monitoring, dual storage redundancy, and automated backups.
For similar jobs
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.