
Memento
Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
Stars: 1009

Memento is a lightweight and user-friendly version control tool designed for small to medium-sized projects. It provides a simple and intuitive interface for managing project versions and collaborating with team members. With Memento, users can easily track changes, revert to previous versions, and merge different branches. The tool is suitable for developers, designers, content creators, and other professionals who need a streamlined version control solution. Memento simplifies the process of managing project history and ensures that team members are always working on the latest version of the project.
README:
A memory-based, continual-learning framework that helps LLM agents improve from experience without updating model weights.
Planner–Executor Architecture • Case-Based Reasoning • MCP Tooling • Memory-Augmented Learning
![]() Memento vs. Baselines on GAIA validation and test sets. |
![]() Ablation study of Memento across benchmarks. |
![]() Continual learning curves across memory designs. |
![]() Memento’s accuracy improvement on OOD datasets. |
- [2025.09.03] We’ve set up a WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! 🔥 🔥 🔥 Join our WeChat Group Now!
- [2025.08.30] We’re excited to announce that our no-parametric Case-Based Reasoning inference code is now officially open-sourced! 🎉
- [2025.08.28] We’ve created a Discord server to make discussions and collaboration around this project easier. Feel free to join and share your thoughts, ask questions, or contribute ideas! 🔥 🔥 🔥 Join our Discord!
- [2025.08.27] Thanks for your interest in our work! We’ll release our CBR code next week and our Parametric Memory code next month. We’ll keep updating on our further development.
- [2025.08.27] We add a new Crawler MCP in
server/ai_crawler.py
for web crawling and query-aware content compression to reduce token cost. - [2025.08.26] We add the SerpAPI (https://serpapi.com/search-api) MCP tool to help you avoid using the search Docker and speed up development.
- No LLM weight updates. Memento reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. A neural case-selection policy guides actions; experiences are stored and reused via efficient Read/Write operations.
- Two-stage planner–executor loop. A CBR-driven Planner decomposes tasks and retrieves relevant cases; an Executor runs each subtask as an MCP client, orchestrating tools and writing back outcomes.
- Comprehensive tool ecosystem. Built-in support for web search, document processing, code execution, image/video analysis, and more through a unified MCP interface.
- Strong benchmark performance. Achieves competitive results across GAIA, DeepResearcher, SimpleQA, and HLE benchmarks.
Learn from experiences, not gradients. Memento logs successful & failed trajectories into a Case Bank and retrieves by value to steer planning and execution—enabling low-cost, transferable, and online continual learning.
- Meta-Planner: Breaks down high-level queries into executable subtasks using GPT-4.1
- Executor: Executes individual subtasks using o3 or other models via MCP tools
- Case Memory: Stores final-step tuples (s_T, a_T, r_T) for experience replay
- MCP Tool Layer: Unified interface for external tools and services
- Web Research: Live search and controlled crawling via SearxNG
- Document Processing: Multi-format support (PDF, Office, images, audio, video)
- Code Execution: Sandboxed Python workspace with security controls
- Data Analysis: Excel processing, mathematical computations
- Media Analysis: Image captioning, video narration, audio transcription
- Python 3.11+
- OpenAI API key (or compatible API endpoint)
- SearxNG instance for web search
- FFmpeg (system-level binary required for video processing)
# Clone repository
git clone https://github.com/Agent-on-the-Fly/Memento
cd Memento
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Sync dependencies and create virtual environment automatically
uv sync
# Activate the virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
FFmpeg is required for video processing functionality. The ffmpeg-python
package in our dependencies requires a system-level FFmpeg binary.
Windows:
# Option 1: Using Conda (Recommended for isolated environment)
conda install -c conda-forge ffmpeg
# Option 2: Download from official website
# Visit https://ffmpeg.org/download.html and add to PATH
macOS:
# Using Homebrew
brew install ffmpeg
Linux:
# Debian/Ubuntu
sudo apt-get update && sudo apt-get install ffmpeg
# Install and setup crawl4ai
crawl4ai-setup
crawl4ai-doctor
# Install playwright browsers
playwright install
After creating the .env
file, you need to configure the following API keys and service endpoints:
# OPENAI API
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1 # or your custom endpoint
#===========================================
# Tools & Services API
#===========================================
# Chunkr API (https://chunkr.ai/)
CHUNKR_API_KEY=your_chunkr_api_key_here
# Jina API
JINA_API_KEY=your_jina_api_key_here
# ASSEMBLYAI API
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here
Note: Replace your_*_api_key_here
with your actual API keys. Some services are optional depending on which tools you plan to use.
For web search capabilities, set up SearxNG: You can follow https://github.com/searxng/searxng-docker/ to set the docker and use our setting.
# In a new terminal
cd ./Memento/searxng-docker
docker compose up -d
python client/agent.py
-
Planner Model: Defaults to
gpt-4.1
for task decomposition -
Executor Model: Defaults to
o3
for task execution - Custom Models: Support for any OpenAI-compatible API
- Search: Configure SearxNG instance URL
- Code Execution: Customize import whitelist and security settings
- Document Processing: Set cache directories and processing limits
- GAIA: 87.88% (Val, Pass@3 Top-1) and 79.40% (Test)
- DeepResearcher: 66.6% F1 / 80.4% PM, with +4.7–9.6 absolute gains on OOD datasets
- SimpleQA: 95.0%
- HLE: 24.4% PM (close to GPT-5 at 25.32%)
- Small, high-quality memory works best: Retrieval K=4 yields peak F1/PM
- Planning + CBR consistently improves performance
- Concise, structured planning outperforms verbose deliberation
Memento/
├── client/ # Main agent implementation
│ ├── agent.py # Hierarchical client with planner–executor
│ └── no_parametric_cbr.py # Non-parametric case-based reasoning
├── server/ # MCP tool servers
│ ├── code_agent.py # Code execution & workspace management
│ ├── search_tool.py # Web search via SearxNG
│ ├── serp_search.py # SERP-based search tool
│ ├── documents_tool.py # Multi-format document processing
│ ├── image_tool.py # Image analysis & captioning
│ ├── video_tool.py # Video processing & narration
│ ├── excel_tool.py # Spreadsheet processing
│ ├── math_tool.py # Mathematical computations
│ ├── craw_page.py # Web page crawling
│ └── ai_crawler.py # Query-aware compression crawler
├── interpreters/ # Code execution backends
│ ├── docker_interpreter.py
│ ├── e2b_interpreter.py
│ ├── internal_python_interpreter.py
│ └── subprocess_interpreter.py
├── memory/ # Memory components / data
├── data/ # Sample data / cases
├── searxng-docker/ # SearxNG Docker setup
├── Figure/ # Figures for README/paper
├── README.md
├── requirements.txt
└── LICENSE
- Create a new FastMCP server in the
server/
directory - Implement your tool functions with proper error handling
- Register the tool with the MCP protocol
- Update the client's server list in
agent.py
Extend the interpreters/
module to add new execution backends:
from interpreters.base import BaseInterpreter
class CustomInterpreter(BaseInterpreter):
async def execute(self, code: str) -> str:
# Your custom execution logic
pass
- [ ] Add Case Bank Reasoning: Implement memory-based case retrieval and reasoning system
- [ ] Add User Personal Memory Mechanism: Implement user-preference search
- [ ] Refine Tools & Add More Tools: Enhance existing tools and expand the tool ecosystem
- [ ] Test More New Benchmarks: Evaluate performance on additional benchmark datasets
- Long-horizon tasks: GAIA Level-3 remains challenging due to compounding errors
- Frontier knowledge: HLE performance limited by tooling alone
- Open-source coverage: Limited executor validation in fully open pipelines
- Some parts of the code in the toolkits and interpreters are adapted from Camel-AI.
If Memento helps your work, please cite:
@article{zhou2025agentfly,
title={AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs},
author={Zhou, Huichi and Chen, Yihang and Guo, Siyuan and Yan, Xue and Lee, Kin Hei and Wang, Zihan and Lee, Ka Yiu and Zhang, Guchun and Shao, Kun and Yang, Linyi and others},
journal={arXiv preprint arXiv:2508.16153},
year={2025}
}
@article{huang2025deep,
title={Deep Research Agents: A Systematic Examination And Roadmap},
author={Huang, Yuxuan and Chen, Yihang and Zhang, Haozheng and Li, Kang and Fang, Meng and Yang, Linyi and Li, Xiaoguang and Shang, Lifeng and Xu, Songcen and Hao, Jianye and others},
journal={arXiv preprint arXiv:2506.18096},
year={2025}
}
For a broader overview, please check out our survey: Github
We welcome contributions! Please see our contributing guidelines for:
- Bug reports and feature requests
- Code contributions and pull requests
- Documentation improvements
- Tool and interpreter extensions
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Memento
Similar Open Source Tools

Memento
Memento is a lightweight and user-friendly version control tool designed for small to medium-sized projects. It provides a simple and intuitive interface for managing project versions and collaborating with team members. With Memento, users can easily track changes, revert to previous versions, and merge different branches. The tool is suitable for developers, designers, content creators, and other professionals who need a streamlined version control solution. Memento simplifies the process of managing project history and ensures that team members are always working on the latest version of the project.

verl-tool
The verl-tool is a versatile command-line utility designed to streamline various tasks related to version control and code management. It provides a simple yet powerful interface for managing branches, merging changes, resolving conflicts, and more. With verl-tool, users can easily track changes, collaborate with team members, and ensure code quality throughout the development process. Whether you are a beginner or an experienced developer, verl-tool offers a seamless experience for version control operations.

open-webui-tools
Open WebUI Tools Collection is a set of tools for structured planning, arXiv paper search, Hugging Face text-to-image generation, prompt enhancement, and multi-model conversations. It enhances LLM interactions with academic research, image generation, and conversation management. Tools include arXiv Search Tool and Hugging Face Image Generator. Function Pipes like Planner Agent offer autonomous plan generation and execution. Filters like Prompt Enhancer improve prompt quality. Installation and configuration instructions are provided for each tool and pipe.

vibe
Vibe Design System is a collection of packages for React.js development, providing components, styles, and guidelines to streamline the development process and enhance user experience. It includes a Core component library, Icons library, Testing utilities, Codemods, and more. The system also features an MCP server for intelligent assistance with component APIs, usage examples, icons, and best practices. Vibe 2 is no longer actively maintained, with users encouraged to upgrade to Vibe 3 for the latest improvements and ongoing support.

ome
Ome is a versatile tool designed for managing and organizing tasks and projects efficiently. It provides a user-friendly interface for creating, tracking, and prioritizing tasks, as well as collaborating with team members. With Ome, users can easily set deadlines, assign tasks, and monitor progress to ensure timely completion of projects. The tool offers customizable features such as tags, labels, and filters to streamline task management and improve productivity. Ome is suitable for individuals, teams, and organizations looking to enhance their task management process and achieve better results.

SpecForge
SpecForge is a powerful tool for generating API specifications from code. It helps developers to easily create and maintain accurate API documentation by extracting information directly from the codebase. With SpecForge, users can streamline the process of documenting APIs, ensuring consistency and reducing manual effort. The tool supports various programming languages and frameworks, making it versatile and adaptable to different development environments. By automating the generation of API specifications, SpecForge enhances collaboration between developers and stakeholders, improving overall project efficiency and quality.

arcade-ai
Arcade AI is a developer-focused tooling and API platform designed to enhance the capabilities of LLM applications and agents. It simplifies the process of connecting agentic applications with user data and services, allowing developers to concentrate on building their applications. The platform offers prebuilt toolkits for interacting with various services, supports multiple authentication providers, and provides access to different language models. Users can also create custom toolkits and evaluate their tools using Arcade AI. Contributions are welcome, and self-hosting is possible with the provided documentation.

mcp-fundamentals
The mcp-fundamentals repository is a collection of fundamental concepts and examples related to microservices, cloud computing, and DevOps. It covers topics such as containerization, orchestration, CI/CD pipelines, and infrastructure as code. The repository provides hands-on exercises and code samples to help users understand and apply these concepts in real-world scenarios. Whether you are a beginner looking to learn the basics or an experienced professional seeking to refresh your knowledge, mcp-fundamentals has something for everyone.

langfuse-docs
Langfuse Docs is a repository for langfuse.com, built on Nextra. It provides guidelines for contributing to the documentation using GitHub Codespaces and local development setup. The repository includes Python cookbooks in Jupyter notebooks format, which are converted to markdown for rendering on the site. It also covers media management for images, videos, and gifs. The stack includes Nextra, Next.js, shadcn/ui, and Tailwind CSS. Additionally, there is a bundle analysis feature to analyze the production build bundle size using @next/bundle-analyzer.

meeting-minutes
An open-source AI assistant for taking meeting notes that captures live meeting audio, transcribes it in real-time, and generates summaries while ensuring user privacy. Perfect for teams to focus on discussions while automatically capturing and organizing meeting content without external servers or complex infrastructure. Features include modern UI, real-time audio capture, speaker diarization, local processing for privacy, and more. The tool also offers a Rust-based implementation for better performance and native integration, with features like live transcription, speaker diarization, and a rich text editor for notes. Future plans include database connection for saving meeting minutes, improving summarization quality, and adding download options for meeting transcriptions and summaries. The backend supports multiple LLM providers through a unified interface, with configurations for Anthropic, Groq, and Ollama models. System architecture includes core components like audio capture service, transcription engine, LLM orchestrator, data services, and API layer. Prerequisites for setup include Node.js, Python, FFmpeg, and Rust. Development guidelines emphasize project structure, testing, documentation, type hints, and ESLint configuration. Contributions are welcome under the MIT License.

iree-amd-aie
This repository contains an early-phase IREE compiler and runtime plugin for interfacing the AMD AIE accelerator to IREE. It provides architectural overview, developer setup instructions, building guidelines, and runtime driver setup details. The repository focuses on enabling the integration of the AMD AIE accelerator with IREE, offering developers the tools and resources needed to build and run applications leveraging this technology.

tools
Strands Agents Tools is a community-driven project that provides a powerful set of tools for your agents to use. It bridges the gap between large language models and practical applications by offering ready-to-use tools for file operations, system execution, API interactions, mathematical operations, and more. The tools cover a wide range of functionalities including file operations, shell integration, memory storage, web infrastructure, HTTP client, Slack client, Python execution, mathematical tools, AWS integration, image and video processing, audio output, environment management, task scheduling, advanced reasoning, swarm intelligence, dynamic MCP client, parallel tool execution, browser automation, diagram creation, RSS feed management, and computer automation.

sciml.ai
SciML.ai is an open source software organization dedicated to unifying packages for scientific machine learning. It focuses on developing modular scientific simulation support software, including differential equation solvers, inverse problems methodologies, and automated model discovery. The organization aims to provide a diverse set of tools with a common interface, creating a modular, easily-extendable, and highly performant ecosystem for scientific simulations. The website serves as a platform to showcase SciML organization's packages and share news within the ecosystem. Pull requests are encouraged for contributions.

promptl
Promptl is a versatile command-line tool designed to streamline the process of creating and managing prompts for user input in various programming projects. It offers a simple and efficient way to prompt users for information, validate their input, and handle different scenarios based on their responses. With Promptl, developers can easily integrate interactive prompts into their scripts, applications, and automation workflows, enhancing user experience and improving overall usability. The tool provides a range of customization options and features, making it suitable for a wide range of use cases across different programming languages and environments.

ml-retreat
ML-Retreat is a comprehensive machine learning library designed to simplify and streamline the process of building and deploying machine learning models. It provides a wide range of tools and utilities for data preprocessing, model training, evaluation, and deployment. With ML-Retreat, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to optimize their models. The library is built with a focus on scalability, performance, and ease of use, making it suitable for both beginners and experienced machine learning practitioners.

chatluna
Chatluna is a machine learning model plugin that provides chat services with large language models. It is highly extensible, supports multiple output formats, and offers features like custom conversation presets, rate limiting, and context awareness. Users can deploy Chatluna under Koishi without additional configuration. The plugin supports various models/platforms like OpenAI, Azure OpenAI, Google Gemini, and more. It also provides preset customization using YAML files and allows for easy forking and development within Koishi projects. However, the project lacks web UI, HTTP server, and project documentation, inviting contributions from the community.
For similar tasks

Memento
Memento is a lightweight and user-friendly version control tool designed for small to medium-sized projects. It provides a simple and intuitive interface for managing project versions and collaborating with team members. With Memento, users can easily track changes, revert to previous versions, and merge different branches. The tool is suitable for developers, designers, content creators, and other professionals who need a streamlined version control solution. Memento simplifies the process of managing project history and ensures that team members are always working on the latest version of the project.

comfyui_prompt_assistant
ComfyUI Prompt Assistant is a plugin that enables prompt word translation, expansion, preset tag insertion, image reverse prompt words, and history record functions without adding nodes. It offers features like UI optimization, avoiding scroll bar overlap, tag popup window scrollbar fix, and more. Users can manually install the latest version from the Releases section. The tool supports various functionalities like image reverse, Kontext presets, translation nodes, and custom rules. It also provides features for tag insertion, LLM expansion, translation switching between Baidu and LLM, and history management.

verl-tool
The verl-tool is a versatile command-line utility designed to streamline various tasks related to version control and code management. It provides a simple yet powerful interface for managing branches, merging changes, resolving conflicts, and more. With verl-tool, users can easily track changes, collaborate with team members, and ensure code quality throughout the development process. Whether you are a beginner or an experienced developer, verl-tool offers a seamless experience for version control operations.

robusta
Robusta is a tool designed to enhance Prometheus notifications for Kubernetes environments. It offers features such as smart grouping to reduce notification spam, AI investigation for alert analysis, alert enrichment with additional data like pod logs, self-healing capabilities for defining auto-remediation rules, advanced routing options, problem detection without PromQL, change-tracking for Kubernetes resources, auto-resolve functionality, and integration with various external systems like Slack, Teams, and Jira. Users can utilize Robusta with or without Prometheus, and it can be installed alongside existing Prometheus setups or as part of an all-in-one Kubernetes observability stack.

cursor-agent-tracking
Cursor Agent History Tracking System is a simple tool to maintain context and track changes in conversations with Cursor when it's in AGENT mode. It ensures continuity even if the AI 'forgets' previous interactions. The system includes templates for starting chat sessions, tracking changes, and maintaining project status and goals. Users can modify the templates to suit their specific needs while following best practices for consistent formatting and documentation.
For similar jobs

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.