Memento

Official Code of Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

Stars: 1009

Visit

Memento is a lightweight and user-friendly version control tool designed for small to medium-sized projects. It provides a simple and intuitive interface for managing project versions and collaborating with team members. With Memento, users can easily track changes, revert to previous versions, and merge different branches. The tool is suitable for developers, designers, content creators, and other professionals who need a streamlined version control solution. Memento simplifies the process of managing project history and ensures that team members are always working on the latest version of the project.

README:

Memento: Fine-tuning LLM Agents without Fine-tuning LLMs

A memory-based, continual-learning framework that helps LLM agents improve from experience without updating model weights.

Planner–Executor Architecture • Case-Based Reasoning • MCP Tooling • Memory-Augmented Learning

_{Memento vs. Baselines on GAIA validation and test sets.}	_{Ablation study of Memento across benchmarks.}
_{Continual learning curves across memory designs.}	_{Memento’s accuracy improvement on OOD datasets.}

📰 News

[2025.09.03] We’ve set up a WeChat group to make it easier to collaborate and exchange ideas on this project. Welcome to join the Group to share your thoughts, ask questions, or contribute your ideas! 🔥 🔥 🔥 Join our WeChat Group Now!
[2025.08.30] We’re excited to announce that our no-parametric Case-Based Reasoning inference code is now officially open-sourced! 🎉
[2025.08.28] We’ve created a Discord server to make discussions and collaboration around this project easier. Feel free to join and share your thoughts, ask questions, or contribute ideas! 🔥 🔥 🔥 Join our Discord!
[2025.08.27] Thanks for your interest in our work! We’ll release our CBR code next week and our Parametric Memory code next month. We’ll keep updating on our further development.
[2025.08.27] We add a new Crawler MCP in server/ai_crawler.py for web crawling and query-aware content compression to reduce token cost.
[2025.08.26] We add the SerpAPI (https://serpapi.com/search-api) MCP tool to help you avoid using the search Docker and speed up development.

🔥 Key Features

No LLM weight updates. Memento reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. A neural case-selection policy guides actions; experiences are stored and reused via efficient Read/Write operations.
Two-stage planner–executor loop. A CBR-driven Planner decomposes tasks and retrieves relevant cases; an Executor runs each subtask as an MCP client, orchestrating tools and writing back outcomes.
Comprehensive tool ecosystem. Built-in support for web search, document processing, code execution, image/video analysis, and more through a unified MCP interface.
Strong benchmark performance. Achieves competitive results across GAIA, DeepResearcher, SimpleQA, and HLE benchmarks.

🧠 Core Concept

Learn from experiences, not gradients. Memento logs successful & failed trajectories into a Case Bank and retrieves by value to steer planning and execution—enabling low-cost, transferable, and online continual learning.

🏗️ Architecture

Core Components

Meta-Planner: Breaks down high-level queries into executable subtasks using GPT-4.1
Executor: Executes individual subtasks using o3 or other models via MCP tools
Case Memory: Stores final-step tuples (s_T, a_T, r_T) for experience replay
MCP Tool Layer: Unified interface for external tools and services

Tool Ecosystem

Web Research: Live search and controlled crawling via SearxNG
Document Processing: Multi-format support (PDF, Office, images, audio, video)
Code Execution: Sandboxed Python workspace with security controls
Data Analysis: Excel processing, mathematical computations
Media Analysis: Image captioning, video narration, audio transcription

🚀 Quick Start

Prerequisites

Python 3.11+
OpenAI API key (or compatible API endpoint)
SearxNG instance for web search
FFmpeg (system-level binary required for video processing)

Installation

Method 1: Using uv (Recommended - Fast & Modern)

# Clone repository
git clone https://github.com/Agent-on-the-Fly/Memento
cd Memento

# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Sync dependencies and create virtual environment automatically
uv sync

# Activate the virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

System Dependencies Installation

FFmpeg Installation (Required)

FFmpeg is required for video processing functionality. The ffmpeg-python package in our dependencies requires a system-level FFmpeg binary.

Windows:

# Option 1: Using Conda (Recommended for isolated environment)
conda install -c conda-forge ffmpeg

# Option 2: Download from official website
# Visit https://ffmpeg.org/download.html and add to PATH

macOS:

# Using Homebrew
brew install ffmpeg

Linux:

# Debian/Ubuntu
sudo apt-get update && sudo apt-get install ffmpeg

Web Scraping & Search Setup

# Install and setup crawl4ai
crawl4ai-setup
crawl4ai-doctor

# Install playwright browsers
playwright install

Environment Variables Configuration

After creating the .env file, you need to configure the following API keys and service endpoints:

# OPENAI API
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1  # or your custom endpoint

#===========================================
# Tools & Services API
#===========================================
# Chunkr API (https://chunkr.ai/)
CHUNKR_API_KEY=your_chunkr_api_key_here

# Jina API
JINA_API_KEY=your_jina_api_key_here

# ASSEMBLYAI API 
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here

Note: Replace your_*_api_key_here with your actual API keys. Some services are optional depending on which tools you plan to use.

SearxNG Setup

For web search capabilities, set up SearxNG: You can follow https://github.com/searxng/searxng-docker/ to set the docker and use our setting.

# In a new terminal
cd ./Memento/searxng-docker
docker compose up -d

Basic Usage

Interactive Mode

python client/agent.py

🔧 Configuration

Model Selection

Planner Model: Defaults to gpt-4.1 for task decomposition
Executor Model: Defaults to o3 for task execution
Custom Models: Support for any OpenAI-compatible API

Tool Configuration

Search: Configure SearxNG instance URL
Code Execution: Customize import whitelist and security settings
Document Processing: Set cache directories and processing limits

📊 Performance

Benchmark Results

GAIA: 87.88% (Val, Pass@3 Top-1) and 79.40% (Test)
DeepResearcher: 66.6% F1 / 80.4% PM, with +4.7–9.6 absolute gains on OOD datasets
SimpleQA: 95.0%
HLE: 24.4% PM (close to GPT-5 at 25.32%)

Key Insights

Small, high-quality memory works best: Retrieval K=4 yields peak F1/PM
Planning + CBR consistently improves performance
Concise, structured planning outperforms verbose deliberation

🛠️ Development

Project Structure

Memento/
├── client/                   # Main agent implementation
│   ├── agent.py             # Hierarchical client with planner–executor
│   └── no_parametric_cbr.py # Non-parametric case-based reasoning
├── server/                   # MCP tool servers
│   ├── code_agent.py        # Code execution & workspace management
│   ├── search_tool.py       # Web search via SearxNG
│   ├── serp_search.py       # SERP-based search tool
│   ├── documents_tool.py    # Multi-format document processing
│   ├── image_tool.py        # Image analysis & captioning
│   ├── video_tool.py        # Video processing & narration
│   ├── excel_tool.py        # Spreadsheet processing
│   ├── math_tool.py         # Mathematical computations
│   ├── craw_page.py         # Web page crawling
│   └── ai_crawler.py        # Query-aware compression crawler
├── interpreters/             # Code execution backends
│   ├── docker_interpreter.py
│   ├── e2b_interpreter.py
│   ├── internal_python_interpreter.py
│   └── subprocess_interpreter.py
├── memory/                   # Memory components / data
├── data/                     # Sample data / cases
├── searxng-docker/           # SearxNG Docker setup
├── Figure/                   # Figures for README/paper
├── README.md
├── requirements.txt
└── LICENSE

Adding New Tools

Create a new FastMCP server in the server/ directory
Implement your tool functions with proper error handling
Register the tool with the MCP protocol
Update the client's server list in agent.py

Custom Interpreters

Extend the interpreters/ module to add new execution backends:

from interpreters.base import BaseInterpreter

class CustomInterpreter(BaseInterpreter):
    async def execute(self, code: str) -> str:
        # Your custom execution logic
        pass

📋 TODO

Upcoming Features & Improvements

[ ] Add Case Bank Reasoning: Implement memory-based case retrieval and reasoning system
[ ] Add User Personal Memory Mechanism: Implement user-preference search
[ ] Refine Tools & Add More Tools: Enhance existing tools and expand the tool ecosystem
[ ] Test More New Benchmarks: Evaluate performance on additional benchmark datasets

Limitations

Long-horizon tasks: GAIA Level-3 remains challenging due to compounding errors
Frontier knowledge: HLE performance limited by tooling alone
Open-source coverage: Limited executor validation in fully open pipelines

🙏 Acknowledgement

Some parts of the code in the toolkits and interpreters are adapted from Camel-AI.

📚 Citation

If Memento helps your work, please cite:

@article{zhou2025agentfly,
  title={AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs},
  author={Zhou, Huichi and Chen, Yihang and Guo, Siyuan and Yan, Xue and Lee, Kin Hei and Wang, Zihan and Lee, Ka Yiu and Zhang, Guchun and Shao, Kun and Yang, Linyi and others},
  journal={arXiv preprint arXiv:2508.16153},
  year={2025}
}

@article{huang2025deep,
  title={Deep Research Agents: A Systematic Examination And Roadmap},
  author={Huang, Yuxuan and Chen, Yihang and Zhang, Haozheng and Li, Kang and Fang, Meng and Yang, Linyi and Li, Xiaoguang and Shang, Lifeng and Xu, Songcen and Hao, Jianye and others},
  journal={arXiv preprint arXiv:2506.18096},
  year={2025}
}

For a broader overview, please check out our survey: Github

🤝 Contributing

We welcome contributions! Please see our contributing guidelines for:

Bug reports and feature requests
Code contributions and pull requests
Documentation improvements
Tool and interpreter extensions

Star History

For Tasks:

Click tags to check more tools for each tasks

track changes revert versions merge branches manage history work on latest version

For Jobs:

software developer graphic designer content creator web developer project manager

Alternative AI tools for Memento

Similar Open Source Tools

Memento

github

: 1.0k

verl-tool

The verl-tool is a versatile command-line utility designed to streamline various tasks related to version control and code management. It provides a simple yet powerful interface for managing branches, merging changes, resolving conflicts, and more. With verl-tool, users can easily track changes, collaborate with team members, and ensure code quality throughout the development process. Whether you are a beginner or an experienced developer, verl-tool offers a seamless experience for version control operations.

github

: 383

open-webui-tools

Open WebUI Tools Collection is a set of tools for structured planning, arXiv paper search, Hugging Face text-to-image generation, prompt enhancement, and multi-model conversations. It enhances LLM interactions with academic research, image generation, and conversation management. Tools include arXiv Search Tool and Hugging Face Image Generator. Function Pipes like Planner Agent offer autonomous plan generation and execution. Filters like Prompt Enhancer improve prompt quality. Installation and configuration instructions are provided for each tool and pipe.

github

: 348

vibe

Vibe Design System is a collection of packages for React.js development, providing components, styles, and guidelines to streamline the development process and enhance user experience. It includes a Core component library, Icons library, Testing utilities, Codemods, and more. The system also features an MCP server for intelligent assistance with component APIs, usage examples, icons, and best practices. Vibe 2 is no longer actively maintained, with users encouraged to upgrade to Vibe 3 for the latest improvements and ongoing support.

github

: 554

ome

Ome is a versatile tool designed for managing and organizing tasks and projects efficiently. It provides a user-friendly interface for creating, tracking, and prioritizing tasks, as well as collaborating with team members. With Ome, users can easily set deadlines, assign tasks, and monitor progress to ensure timely completion of projects. The tool offers customizable features such as tags, labels, and filters to streamline task management and improve productivity. Ome is suitable for individuals, teams, and organizations looking to enhance their task management process and achieve better results.

github

: 277

SpecForge

SpecForge is a powerful tool for generating API specifications from code. It helps developers to easily create and maintain accurate API documentation by extracting information directly from the codebase. With SpecForge, users can streamline the process of documenting APIs, ensuring consistency and reducing manual effort. The tool supports various programming languages and frameworks, making it versatile and adaptable to different development environments. By automating the generation of API specifications, SpecForge enhances collaboration between developers and stakeholders, improving overall project efficiency and quality.

github

: 407

arcade-ai

Arcade AI is a developer-focused tooling and API platform designed to enhance the capabilities of LLM applications and agents. It simplifies the process of connecting agentic applications with user data and services, allowing developers to concentrate on building their applications. The platform offers prebuilt toolkits for interacting with various services, supports multiple authentication providers, and provides access to different language models. Users can also create custom toolkits and evaluate their tools using Arcade AI. Contributions are welcome, and self-hosting is possible with the provided documentation.

github

: 654

mcp-fundamentals

The mcp-fundamentals repository is a collection of fundamental concepts and examples related to microservices, cloud computing, and DevOps. It covers topics such as containerization, orchestration, CI/CD pipelines, and infrastructure as code. The repository provides hands-on exercises and code samples to help users understand and apply these concepts in real-world scenarios. Whether you are a beginner looking to learn the basics or an experienced professional seeking to refresh your knowledge, mcp-fundamentals has something for everyone.

github

: 100

langfuse-docs

Langfuse Docs is a repository for langfuse.com, built on Nextra. It provides guidelines for contributing to the documentation using GitHub Codespaces and local development setup. The repository includes Python cookbooks in Jupyter notebooks format, which are converted to markdown for rendering on the site. It also covers media management for images, videos, and gifs. The stack includes Nextra, Next.js, shadcn/ui, and Tailwind CSS. Additionally, there is a bundle analysis feature to analyze the production build bundle size using @next/bundle-analyzer.

github

: 142

meeting-minutes

An open-source AI assistant for taking meeting notes that captures live meeting audio, transcribes it in real-time, and generates summaries while ensuring user privacy. Perfect for teams to focus on discussions while automatically capturing and organizing meeting content without external servers or complex infrastructure. Features include modern UI, real-time audio capture, speaker diarization, local processing for privacy, and more. The tool also offers a Rust-based implementation for better performance and native integration, with features like live transcription, speaker diarization, and a rich text editor for notes. Future plans include database connection for saving meeting minutes, improving summarization quality, and adding download options for meeting transcriptions and summaries. The backend supports multiple LLM providers through a unified interface, with configurations for Anthropic, Groq, and Ollama models. System architecture includes core components like audio capture service, transcription engine, LLM orchestrator, data services, and API layer. Prerequisites for setup include Node.js, Python, FFmpeg, and Rust. Development guidelines emphasize project structure, testing, documentation, type hints, and ESLint configuration. Contributions are welcome under the MIT License.

github

: 7.6k

iree-amd-aie

This repository contains an early-phase IREE compiler and runtime plugin for interfacing the AMD AIE accelerator to IREE. It provides architectural overview, developer setup instructions, building guidelines, and runtime driver setup details. The repository focuses on enabling the integration of the AMD AIE accelerator with IREE, offering developers the tools and resources needed to build and run applications leveraging this technology.

github

: 106

tools

Strands Agents Tools is a community-driven project that provides a powerful set of tools for your agents to use. It bridges the gap between large language models and practical applications by offering ready-to-use tools for file operations, system execution, API interactions, mathematical operations, and more. The tools cover a wide range of functionalities including file operations, shell integration, memory storage, web infrastructure, HTTP client, Slack client, Python execution, mathematical tools, AWS integration, image and video processing, audio output, environment management, task scheduling, advanced reasoning, swarm intelligence, dynamic MCP client, parallel tool execution, browser automation, diagram creation, RSS feed management, and computer automation.

github

: 620

sciml.ai

SciML.ai is an open source software organization dedicated to unifying packages for scientific machine learning. It focuses on developing modular scientific simulation support software, including differential equation solvers, inverse problems methodologies, and automated model discovery. The organization aims to provide a diverse set of tools with a common interface, creating a modular, easily-extendable, and highly performant ecosystem for scientific simulations. The website serves as a platform to showcase SciML organization's packages and share news within the ecosystem. Pull requests are encouraged for contributions.

github

: 61

promptl

Promptl is a versatile command-line tool designed to streamline the process of creating and managing prompts for user input in various programming projects. It offers a simple and efficient way to prompt users for information, validate their input, and handle different scenarios based on their responses. With Promptl, developers can easily integrate interactive prompts into their scripts, applications, and automation workflows, enhancing user experience and improving overall usability. The tool provides a range of customization options and features, making it suitable for a wide range of use cases across different programming languages and environments.

github

: 71

ml-retreat

ML-Retreat is a comprehensive machine learning library designed to simplify and streamline the process of building and deploying machine learning models. It provides a wide range of tools and utilities for data preprocessing, model training, evaluation, and deployment. With ML-Retreat, users can easily experiment with different algorithms, hyperparameters, and feature engineering techniques to optimize their models. The library is built with a focus on scalability, performance, and ease of use, making it suitable for both beginners and experienced machine learning practitioners.

github

: 2.2k

chatluna

Chatluna is a machine learning model plugin that provides chat services with large language models. It is highly extensible, supports multiple output formats, and offers features like custom conversation presets, rate limiting, and context awareness. Users can deploy Chatluna under Koishi without additional configuration. The plugin supports various models/platforms like OpenAI, Azure OpenAI, Google Gemini, and more. It also provides preset customization using YAML files and allows for easy forking and development within Koishi projects. However, the project lacks web UI, HTTP server, and project documentation, inviting contributions from the community.

github

: 345

For similar tasks

Memento

github

: 1.0k

comfyui_prompt_assistant

ComfyUI Prompt Assistant is a plugin that enables prompt word translation, expansion, preset tag insertion, image reverse prompt words, and history record functions without adding nodes. It offers features like UI optimization, avoiding scroll bar overlap, tag popup window scrollbar fix, and more. Users can manually install the latest version from the Releases section. The tool supports various functionalities like image reverse, Kontext presets, translation nodes, and custom rules. It also provides features for tag insertion, LLM expansion, translation switching between Baidu and LLM, and history management.

github

: 772

verl-tool

github

: 383

robusta

Robusta is a tool designed to enhance Prometheus notifications for Kubernetes environments. It offers features such as smart grouping to reduce notification spam, AI investigation for alert analysis, alert enrichment with additional data like pod logs, self-healing capabilities for defining auto-remediation rules, advanced routing options, problem detection without PromQL, change-tracking for Kubernetes resources, auto-resolve functionality, and integration with various external systems like Slack, Teams, and Jira. Users can utilize Robusta with or without Prometheus, and it can be installed alongside existing Prometheus setups or as part of an all-in-one Kubernetes observability stack.

github

: 2.9k

cursor-agent-tracking

Cursor Agent History Tracking System is a simple tool to maintain context and track changes in conversations with Cursor when it's in AGENT mode. It ensures continuity even if the AI 'forgets' previous interactions. The system includes templates for starting chat sessions, tracking changes, and maintaining project status and goals. Users can modify the templates to suit their specific needs while following best practices for consistent formatting and documentation.

github

: 56

For similar jobs

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

daily-poetry-image

Daily Chinese ancient poetry and AI-generated images powered by Bing DALL-E-3. GitHub Action triggers the process automatically. Poetry is provided by Today's Poem API. The website is built with Astro.

github

: 492

exif-photo-blog

EXIF Photo Blog is a full-stack photo blog application built with Next.js, Vercel, and Postgres. It features built-in authentication, photo upload with EXIF extraction, photo organization by tag, infinite scroll, light/dark mode, automatic OG image generation, a CMD-K menu with photo search, experimental support for AI-generated descriptions, and support for Fujifilm simulations. The application is easy to deploy to Vercel with just a few clicks and can be customized with a variety of environment variables.

github

: 1.4k

SillyTavern

SillyTavern is a user interface you can install on your computer (and Android phones) that allows you to interact with text generation AIs and chat/roleplay with characters you or the community create. SillyTavern is a fork of TavernAI 1.2.8 which is under more active development and has added many major features. At this point, they can be thought of as completely independent programs.

github

: 18.8k

Twitter-Insight-LLM

This project enables you to fetch liked tweets from Twitter (using Selenium), save it to JSON and Excel files, and perform initial data analysis and image captions. This is part of the initial steps for a larger personal project involving Large Language Models (LLMs).

github

: 401

AISuperDomain

Aila Desktop Application is a powerful tool that integrates multiple leading AI models into a single desktop application. It allows users to interact with various AI models simultaneously, providing diverse responses and insights to their inquiries. With its user-friendly interface and customizable features, Aila empowers users to engage with AI seamlessly and efficiently. Whether you're a researcher, student, or professional, Aila can enhance your AI interactions and streamline your workflow.

github

: 1.2k

ChatGPT-On-CS

This project is an intelligent dialogue customer service tool based on a large model, which supports access to platforms such as WeChat, Qianniu, Bilibili, Douyin Enterprise, Douyin, Doudian, Weibo chat, Xiaohongshu professional account operation, Xiaohongshu, Zhihu, etc. You can choose GPT3.5/GPT4.0/ Lazy Treasure Box (more platforms will be supported in the future), which can process text, voice and pictures, and access external resources such as operating systems and the Internet through plug-ins, and support enterprise AI applications customized based on their own knowledge base.

github

: 768

obs-localvocal

LocalVocal is a live-streaming AI assistant plugin for OBS that allows you to transcribe audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). It's privacy-first, with all data staying on your machine, and requires no GPU, cloud costs, network, or downtime.

github

: 248