
Memento
Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs
Stars: 1009

Memento is a lightweight and user-friendly version control tool designed for small to medium-sized projects. It provides a simple and intuitive interface for managing project versions and collaborating with team members. With Memento, users can easily track changes, revert to previous versions, and merge different branches. The tool is suitable for developers, designers, content creators, and other professionals who need a streamlined version control solution. Memento simplifies the process of managing project history and ensures that team members are always working on the latest version of the project.
README:
A memory-based, continual-learning framework that helps LLM agents improve from experience without updating model weights.
Planner–Executor Architecture • Case-Based Reasoning • MCP Tooling • Memory-Augmented Learning
![]() Memento vs. Baselines on GAIA validation and test sets. |
![]() Ablation study of Memento across benchmarks. |
![]() Continual learning curves across memory designs. |
![]() Memento’s accuracy improvement on OOD datasets. |
- [2025.09.03] We’ve set up a WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! 🔥 🔥 🔥 Join our WeChat Group Now!
- [2025.08.30] We’re excited to announce that our no-parametric Case-Based Reasoning inference code is now officially open-sourced! 🎉
- [2025.08.28] We’ve created a Discord server to make discussions and collaboration around this project easier. Feel free to join and share your thoughts, ask questions, or contribute ideas! 🔥 🔥 🔥 Join our Discord!
- [2025.08.27] Thanks for your interest in our work! We’ll release our CBR code next week and our Parametric Memory code next month. We’ll keep updating on our further development.
- [2025.08.27] We add a new Crawler MCP in
server/ai_crawler.py
for web crawling and query-aware content compression to reduce token cost. - [2025.08.26] We add the SerpAPI (https://serpapi.com/search-api) MCP tool to help you avoid using the search Docker and speed up development.
- No LLM weight updates. Memento reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. A neural case-selection policy guides actions; experiences are stored and reused via efficient Read/Write operations.
- Two-stage planner–executor loop. A CBR-driven Planner decomposes tasks and retrieves relevant cases; an Executor runs each subtask as an MCP client, orchestrating tools and writing back outcomes.
- Comprehensive tool ecosystem. Built-in support for web search, document processing, code execution, image/video analysis, and more through a unified MCP interface.
- Strong benchmark performance. Achieves competitive results across GAIA, DeepResearcher, SimpleQA, and HLE benchmarks.
Learn from experiences, not gradients. Memento logs successful & failed trajectories into a Case Bank and retrieves by value to steer planning and execution—enabling low-cost, transferable, and online continual learning.
- Meta-Planner: Breaks down high-level queries into executable subtasks using GPT-4.1
- Executor: Executes individual subtasks using o3 or other models via MCP tools
- Case Memory: Stores final-step tuples (s_T, a_T, r_T) for experience replay
- MCP Tool Layer: Unified interface for external tools and services
- Web Research: Live search and controlled crawling via SearxNG
- Document Processing: Multi-format support (PDF, Office, images, audio, video)
- Code Execution: Sandboxed Python workspace with security controls
- Data Analysis: Excel processing, mathematical computations
- Media Analysis: Image captioning, video narration, audio transcription
- Python 3.11+
- OpenAI API key (or compatible API endpoint)
- SearxNG instance for web search
- FFmpeg (system-level binary required for video processing)
# Clone repository
git clone https://github.com/Agent-on-the-Fly/Memento
cd Memento
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Sync dependencies and create virtual environment automatically
uv sync
# Activate the virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
FFmpeg is required for video processing functionality. The ffmpeg-python
package in our dependencies requires a system-level FFmpeg binary.
Windows:
# Option 1: Using Conda (Recommended for isolated environment)
conda install -c conda-forge ffmpeg
# Option 2: Download from official website
# Visit https://ffmpeg.org/download.html and add to PATH
macOS:
# Using Homebrew
brew install ffmpeg
Linux:
# Debian/Ubuntu
sudo apt-get update && sudo apt-get install ffmpeg
# Install and setup crawl4ai
crawl4ai-setup
crawl4ai-doctor
# Install playwright browsers
playwright install
After creating the .env
file, you need to configure the following API keys and service endpoints:
# OPENAI API
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1 # or your custom endpoint
#===========================================
# Tools & Services API
#===========================================
# Chunkr API (https://chunkr.ai/)
CHUNKR_API_KEY=your_chunkr_api_key_here
# Jina API
JINA_API_KEY=your_jina_api_key_here
# ASSEMBLYAI API
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here
Note: Replace your_*_api_key_here
with your actual API keys. Some services are optional depending on which tools you plan to use.
For web search capabilities, set up SearxNG: You can follow https://github.com/searxng/searxng-docker/ to set the docker and use our setting.
# In a new terminal
cd ./Memento/searxng-docker
docker compose up -d
python client/agent.py
-
Planner Model: Defaults to
gpt-4.1
for task decomposition -
Executor Model: Defaults to
o3
for task execution - Custom Models: Support for any OpenAI-compatible API
- Search: Configure SearxNG instance URL
- Code Execution: Customize import whitelist and security settings
- Document Processing: Set cache directories and processing limits
- GAIA: 87.88% (Val, Pass@3 Top-1) and 79.40% (Test)
- DeepResearcher: 66.6% F1 / 80.4% PM, with +4.7–9.6 absolute gains on OOD datasets
- SimpleQA: 95.0%
- HLE: 24.4% PM (close to GPT-5 at 25.32%)
- Small, high-quality memory works best: Retrieval K=4 yields peak F1/PM
- Planning + CBR consistently improves performance
- Concise, structured planning outperforms verbose deliberation
Memento/
├── client/ # Main agent implementation
│ ├── agent.py # Hierarchical client with planner–executor
│ └── no_parametric_cbr.py # Non-parametric case-based reasoning
├── server/ # MCP tool servers
│ ├── code_agent.py # Code execution & workspace management
│ ├── search_tool.py # Web search via SearxNG
│ ├── serp_search.py # SERP-based search tool
│ ├── documents_tool.py # Multi-format document processing
│ ├── image_tool.py # Image analysis & captioning
│ ├── video_tool.py # Video processing & narration
│ ├── excel_tool.py # Spreadsheet processing
│ ├── math_tool.py # Mathematical computations
│ ├── craw_page.py # Web page crawling
│ └── ai_crawler.py # Query-aware compression crawler
├── interpreters/ # Code execution backends
│ ├── docker_interpreter.py
│ ├── e2b_interpreter.py
│ ├── internal_python_interpreter.py
│ └── subprocess_interpreter.py
├── memory/ # Memory components / data
├── data/ # Sample data / cases
├── searxng-docker/ # SearxNG Docker setup
├── Figure/ # Figures for README/paper
├── README.md
├── requirements.txt
└── LICENSE
- Create a new FastMCP server in the
server/
directory - Implement your tool functions with proper error handling
- Register the tool with the MCP protocol
- Update the client's server list in
agent.py
Extend the interpreters/
module to add new execution backends:
from interpreters.base import BaseInterpreter
class CustomInterpreter(BaseInterpreter):
async def execute(self, code: str) -> str:
# Your custom execution logic
pass
- [ ] Add Case Bank Reasoning: Implement memory-based case retrieval and reasoning system
- [ ] Add User Personal Memory Mechanism: Implement user-preference search
- [ ] Refine Tools & Add More Tools: Enhance existing tools and expand the tool ecosystem
- [ ] Test More New Benchmarks: Evaluate performance on additional benchmark datasets
- Long-horizon tasks: GAIA Level-3 remains challenging due to compounding errors
- Frontier knowledge: HLE performance limited by tooling alone
- Open-source coverage: Limited executor validation in fully open pipelines
- Some parts of the code in the toolkits and interpreters are adapted from Camel-AI.
If Memento helps your work, please cite:
@article{zhou2025agentfly,
title={AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs},
author={Zhou, Huichi and Chen, Yihang and Guo, Siyuan and Yan, Xue and Lee, Kin Hei and Wang, Zihan and Lee, Ka Yiu and Zhang, Guchun and Shao, Kun and Yang, Linyi and others},
journal={arXiv preprint arXiv:2508.16153},
year={2025}
}
@article{huang2025deep,
title={Deep Research Agents: A Systematic Examination And Roadmap},
author={Huang, Yuxuan and Chen, Yihang and Zhang, Haozheng and Li, Kang and Fang, Meng and Yang, Linyi and Li, Xiaoguang and Shang, Lifeng and Xu, Songcen and Hao, Jianye and others},
journal={arXiv preprint arXiv:2506.18096},
year={2025}
}
For a broader overview, please check out our survey: Github
We welcome contributions! Please see our contributing guidelines for:
- Bug reports and feature requests
- Code contributions and pull requests
- Documentation improvements
- Tool and interpreter extensions
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for Memento
Similar Open Source Tools

Memento
Memento is a lightweight and user-friendly version control tool designed for small to medium-sized projects. It provides a simple and intuitive interface for managing project versions and collaborating with team members. With Memento, users can easily track changes, revert to previous versions, and merge different branches. The tool is suitable for developers, designers, content creators, and other professionals who need a streamlined version control solution. Memento simplifies the process of managing project history and ensures that team members are always working on the latest version of the project.

verl-tool
The verl-tool is a versatile command-line utility designed to streamline various tasks related to version control and code management. It provides a simple yet powerful interface for managing branches, merging changes, resolving conflicts, and more. With verl-tool, users can easily track changes, collaborate with team members, and ensure code quality throughout the development process. Whether you are a beginner or an experienced developer, verl-tool offers a seamless experience for version control operations.

DocsGPT
DocsGPT is an open-source documentation assistant powered by GPT models. It simplifies the process of searching for information in project documentation by allowing developers to ask questions and receive accurate answers. With DocsGPT, users can say goodbye to manual searches and quickly find the information they need. The tool aims to revolutionize project documentation experiences and offers features like live previews, Discord community, guides, and contribution opportunities. It consists of a Flask app, Chrome extension, similarity search index creation script, and a frontend built with Vite and React. Users can quickly get started with DocsGPT by following the provided setup instructions and can contribute to its development by following the guidelines in the CONTRIBUTING.md file. The project follows a Code of Conduct to ensure a harassment-free community environment for all participants. DocsGPT is licensed under MIT and is built with LangChain.

sciml.ai
SciML.ai is an open source software organization dedicated to unifying packages for scientific machine learning. It focuses on developing modular scientific simulation support software, including differential equation solvers, inverse problems methodologies, and automated model discovery. The organization aims to provide a diverse set of tools with a common interface, creating a modular, easily-extendable, and highly performant ecosystem for scientific simulations. The website serves as a platform to showcase SciML organization's packages and share news within the ecosystem. Pull requests are encouraged for contributions.

deeppowers
Deeppowers is a powerful Python library for deep learning applications. It provides a wide range of tools and utilities to simplify the process of building and training deep neural networks. With Deeppowers, users can easily create complex neural network architectures, perform efficient training and optimization, and deploy models for various tasks. The library is designed to be user-friendly and flexible, making it suitable for both beginners and experienced deep learning practitioners.

vrt-cogs
Vrt-Cogs is a collection of various useful and fun plugins (cogs) for Red-DiscordBot, allowing users to enhance their Discord server experience. The repository includes a wide range of cogs such as an intuitive ban appeal system, an AI assistant powered by OpenAI's ChatGPT, tools for documenting cogs, managing bank balances, tracking economy, hosting events, translating messages, sending emails, managing guild settings, playing games, and more. These cogs offer functionalities to automate tasks, engage users, and enhance server management. The repository also provides tools for profiling cogs, managing Space Engineers servers, setting up a ticket system, integrating with Upgrade.Chat API, accessing Xbox profiles, and more.

Automodel
Automodel is a Python library for automating the process of building and evaluating machine learning models. It provides a set of tools and utilities to streamline the model development workflow, from data preprocessing to model selection and evaluation. With Automodel, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to find the best model for their dataset. The library is designed to be user-friendly and customizable, allowing users to define their own pipelines and workflows. Automodel is suitable for data scientists, machine learning engineers, and anyone looking to quickly build and test machine learning models without the need for manual intervention.

terminal-bench
Terminal Bench is a simple command-line benchmark tool for Unix-like systems. It allows users to easily compare the performance of different commands or scripts by measuring their execution time. The tool provides detailed statistics and visualizations to help users analyze the results. With Terminal Bench, users can optimize their scripts and commands for better performance and efficiency.

trae-agent
Trae-agent is a Python library for building and training reinforcement learning agents. It provides a simple and flexible framework for implementing various reinforcement learning algorithms and experimenting with different environments. With Trae-agent, users can easily create custom agents, define reward functions, and train them on a variety of tasks. The library also includes utilities for visualizing agent performance and analyzing training results, making it a valuable tool for both beginners and experienced researchers in the field of reinforcement learning.

chainlit
Chainlit is an open-source async Python framework which allows developers to build scalable Conversational AI or agentic applications. It enables users to create ChatGPT-like applications, embedded chatbots, custom frontends, and API endpoints. The framework provides features such as multi-modal chats, chain of thought visualization, data persistence, human feedback, and an in-context prompt playground. Chainlit is compatible with various Python programs and libraries, including LangChain, Llama Index, Autogen, OpenAI Assistant, and Haystack. It offers a range of examples and a cookbook to showcase its capabilities and inspire users. Chainlit welcomes contributions and is licensed under the Apache 2.0 license.

llmgateway
The llmgateway repository is a tool that provides a gateway for interacting with various LLM (Large Language Model) models. It allows users to easily access and utilize pre-trained language models for tasks such as text generation, sentiment analysis, and language translation. The tool simplifies the process of integrating LLMs into applications and workflows, enabling developers to leverage the power of state-of-the-art language models for various natural language processing tasks.

tiledesk-dashboard
Tiledesk is an open-source live chat platform with integrated chatbots written in Node.js and Express. It is designed to be a multi-channel platform for web, Android, and iOS, and it can be used to increase sales or provide post-sales customer service. Tiledesk's chatbot technology allows for automation of conversations, and it also provides APIs and webhooks for connecting external applications. Additionally, it offers a marketplace for apps and features such as CRM, ticketing, and data export.

WorkflowAI
WorkflowAI is a powerful tool designed to streamline and automate various tasks within the workflow process. It provides a user-friendly interface for creating custom workflows, automating repetitive tasks, and optimizing efficiency. With WorkflowAI, users can easily design, execute, and monitor workflows, allowing for seamless integration of different tools and systems. The tool offers advanced features such as conditional logic, task dependencies, and error handling to ensure smooth workflow execution. Whether you are managing project tasks, processing data, or coordinating team activities, WorkflowAI simplifies the workflow management process and enhances productivity.

aiounifi
Aiounifi is a Python library that provides a simple interface for interacting with the Unifi Controller API. It allows users to easily manage their Unifi network devices, such as access points, switches, and gateways, through automated scripts or applications. With Aiounifi, users can retrieve device information, perform configuration changes, monitor network performance, and more, all through a convenient and efficient API wrapper. This library simplifies the process of integrating Unifi network management into custom solutions, making it ideal for network administrators, developers, and enthusiasts looking to automate and streamline their network operations.

atomic-agents
The Atomic Agents framework is a modular and extensible tool designed for creating powerful applications. It leverages Pydantic for data validation and serialization. The framework follows the principles of Atomic Design, providing small and single-purpose components that can be combined. It integrates with Instructor for AI agent architecture and supports various APIs like Cohere, Anthropic, and Gemini. The tool includes documentation, examples, and testing features to ensure smooth development and usage.

chatmcp
Chatmcp is a chatbot framework for building conversational AI applications. It provides a flexible and extensible platform for creating chatbots that can interact with users in a natural language. With Chatmcp, developers can easily integrate chatbot functionality into their applications, enabling users to communicate with the system through text-based conversations. The framework supports various natural language processing techniques and allows for the customization of chatbot behavior and responses. Chatmcp simplifies the development of chatbots by providing a set of pre-built components and tools that streamline the creation process. Whether you are building a customer support chatbot, a virtual assistant, or a chat-based game, Chatmcp offers the necessary features and capabilities to bring your conversational AI ideas to life.
For similar tasks

Memento
Memento is a lightweight and user-friendly version control tool designed for small to medium-sized projects. It provides a simple and intuitive interface for managing project versions and collaborating with team members. With Memento, users can easily track changes, revert to previous versions, and merge different branches. The tool is suitable for developers, designers, content creators, and other professionals who need a streamlined version control solution. Memento simplifies the process of managing project history and ensures that team members are always working on the latest version of the project.

verl-tool
The verl-tool is a versatile command-line utility designed to streamline various tasks related to version control and code management. It provides a simple yet powerful interface for managing branches, merging changes, resolving conflicts, and more. With verl-tool, users can easily track changes, collaborate with team members, and ensure code quality throughout the development process. Whether you are a beginner or an experienced developer, verl-tool offers a seamless experience for version control operations.

robusta
Robusta is a tool designed to enhance Prometheus notifications for Kubernetes environments. It offers features such as smart grouping to reduce notification spam, AI investigation for alert analysis, alert enrichment with additional data like pod logs, self-healing capabilities for defining auto-remediation rules, advanced routing options, problem detection without PromQL, change-tracking for Kubernetes resources, auto-resolve functionality, and integration with various external systems like Slack, Teams, and Jira. Users can utilize Robusta with or without Prometheus, and it can be installed alongside existing Prometheus setups or as part of an all-in-one Kubernetes observability stack.

cursor-agent-tracking
Cursor Agent History Tracking System is a simple tool to maintain context and track changes in conversations with Cursor when it's in AGENT mode. It ensures continuity even if the AI 'forgets' previous interactions. The system includes templates for starting chat sessions, tracking changes, and maintaining project status and goals. Users can modify the templates to suit their specific needs while following best practices for consistent formatting and documentation.
For similar jobs

LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

daily-poetry-image
Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

exif-photo-blog
EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

SillyTavern
SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

Twitter-Insight-LLM
This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

AISuperDomain
Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

ChatGPT-On-CS
This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

obs-localvocal
LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.