| Date |
Paper & Summary |
Tags |
Links |
| 2026-02-12 |
Learning to Forget Attention: Memory Consolidation for Adaptive Compute Reduction |
|
|
• The paper finds that 88% of attention operations in LLMs retrieve predictable information, a redundancy that persists throughout standard training without being eliminated.
• It proposes CRAM, an architecture that adaptively reduces compute by gradually distilling frequently accessed episodic retrievals into efficient parametric semantic memory via a consolidation-aware router.
• CRAM achieves a 37.8x reduction in attention compute while its learned consolidation dynamics quantitatively match the power-law transition curves observed in human cognitive psychology.
|
| 2026-02-12 |
Scene-Aware Memory Discrimination: Deciding Which Personal Knowledge Stays |
|
|
• It introduces the "memory discrimination" task, which acts as a filter during memory construction to identify and store only valuable personal knowledge from vast daily interactions.
• It proposes the Scene-Aware Memory Discrimination (SAMD) framework, combining a Gating Unit Module (GUM) for efficient noise filtering and a Cluster Prompting Module (CPM) for adaptive memory standards.
• Evaluations demonstrate that SAMD significantly enhances the efficiency and quality of memory construction, successfully recalling critical data while reducing computational costs for personalized AI agents.
|
| 2026-02-12 |
Recurrent Preference Memory for Efficient Long-Sequence Generative Recommendation |
|
|
• It proposes the Rec2PM framework with a tripartite memory mechanism that compresses long user histories into compact Preference Memory tokens, overcoming the computational bottlenecks of long-sequence modeling.
• It introduces a self-referential teacher-forcing strategy that leverages a global view of history to generate reference targets, enabling fully parallelized training for recurrent memory updates.
• It demonstrates superior storage and inference efficiency while acting as a denoising Information Bottleneck to filter interaction noise, achieving higher accuracy than full-sequence models.
|
| 2026-02-12 |
TS-Memory: Plug-and-Play Memory for Time Series Foundation Models |
|
|
• It proposes TS-Memory, a plug-and-play lightweight memory adapter that uses Parametric Memory Distillation to adapt frozen Time Series Foundation Models (TSFMs) to downstream domain shifts.
• It employs a two-stage training strategy: first constructing an offline kNN teacher to generate privileged supervision signals, then distilling retrieval-induced distributional corrections into a parametric module via confidence-gated supervision.
• It enables retrieval-free inference with constant-time complexity, consistently improving forecasting accuracy across benchmarks while maintaining the efficiency of the frozen backbone and avoiding catastrophic forgetting.
|
| 2026-02-11 |
When to Memorize and When to Stop: Gated Recurrent Memory for Long-Context Reasoning |
|
|
• It proposes GRU-Mem, a gated recurrent memory framework that processes long context chunk-by-chunk to overcome performance degradation and context window limits in LLMs.
• It introduces two text-controlled gates, an update gate and an exit gate, to selectively update memory and enable early termination, effectively preventing memory explosion and reducing redundant computation.
• Optimized via end-to-end reinforcement learning, GRU-Mem significantly outperforms existing methods in reasoning tasks and achieves up to 400% inference speed acceleration compared to vanilla recurrent memory agents.
|
| 2026-02-11 |
Towards Compressive and Scalable Recurrent Memory |
|
|
• It proposes Elastic Memory, a novel recurrent memory architecture grounded in the HiPPO framework that encodes long-range history into fixed-size states via optimal online function approximation.
• It develops a parallelized block-level update and a flexible "polynomial sampling" mechanism for efficient retrieval, enabling the reconstruction of history summaries from compressed states without extra trainable parameters.
• Experiments demonstrate that it outperforms SOTA baselines on 32k+ context tasks with superior efficiency, while its decoupled design allows for injecting inductive biases at test-time without retraining.
|
| 2026-02-11 |
UMEM: Uniffed Memory Extraction and Management Framework for Generalizable Memory |
|
|
• The paper proposes UMEM, a framework that jointly optimizes memory extraction and management to resolve the policy misalignment and poor generalization found in traditional static extraction methods.
• It introduces Semantic Neighborhood Modeling and a Marginal Utility Reward via GRPO, forcing the agent to distill generalizable principles across clusters of related tasks instead of memorizing instance-specific noise.
• Experiments show that UMEM significantly outperforms state-of-the-art baselines across multiple benchmarks and maintains a stable, monotonic performance growth curve during continuous self-evolution.
|
| 2026-02-10 |
TraceMem: Weaving Narrative Memory Schemata from User Conversational Traces |
|
|
• It proposes TraceMem, a cognitively-inspired framework that weaves disjointed conversational traces into structured narrative memory schemata through a three-stage pipeline of processing and consolidation.
• It mimics human memory consolidation by using topic segmentation and hierarchical clustering to transform episodic snippets into coherent, time-evolving narrative threads and structured user memory cards.
• It implements an agentic search mechanism to enable human-like source attribution, achieving state-of-the-art performance in multi-hop and temporal reasoning for long-term dialogues.
|
| 2026-02-09 |
AMEM4Rec: Leveraging Cross-User Similarity for Memory Evolution in Agentic LLM Recommenders |
|
|
• It proposes AMEM4Rec, an agentic framework that models collaborative filtering signals end-to-end by introducing an evolving memory module without relying on pre-trained CF models.
• It designs a cross-user memory evolution mechanism that aggregates abstract behavior patterns into a global pool, using dual validation to link and iteratively evolve shared memory entries across users.
• Extensive experiments demonstrate that AMEM4Rec significantly outperforms existing baselines, particularly showing superior performance and generalization in sparse interaction scenarios.
|
| 2026-02-09 |
Position: Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs |
|
|
• Introduces the concept of "Implicit Memory," demonstrating that LLMs can bypass their stateless nature by encoding state information in their own outputs to create a persistent hidden channel across independent sessions.
• Implements and validates "Time Bombs," a new class of temporal backdoors that use implicit memory to accumulate hidden states over multiple interactions, activating only after a specific sequence of conditions is met.
• Systematically analyzes risks such as covert communication and benchmark contamination, while outlining future research directions for detecting, evaluating, and controlling unintended persistence in LLMs.
|
| 2026-02-09 |
MemAdapter: Fast Alignment across Agent Memory Paradigms via Generative Subgraph Retrieval |
|
|
• It proposes MemAdapter, a framework that unifies heterogeneous agent memory paradigms (explicit, parametric, and latent) using a paradigm-agnostic generative subgraph retrieval approach.
• It employs a two-stage training strategy: distilling a generative retriever from a unified memory space and then efficiently adapting it to new paradigms via lightweight alignment modules and contrastive learning.
• MemAdapter consistently outperforms existing baselines, completing paradigm alignment in just 13 minutes and enabling effective zero-shot fusion across different memory types.
|
| 2026-02-07 |
MemPot: Defending Against Memory Extraction Attack with Optimized Honeypots |
|
|
• This paper proposes MemPot, the first defense framework against memory extraction attacks that proactively injects optimized honeypots (trap documents) into the agent's memory systems.
• It utilizes a two-stage optimization strategy to maximize the statistical separability between attacker and user retrieval patterns, generating safe and inconspicuous trap texts through safety-constrained embedding inversion.
• Based on Wald’s Sequential Probability Ratio Test (SPRT), MemPot achieves near-perfect detection accuracy with zero online inference latency while preserving the agent's core utility and performance.
|
| 2026-02-05 |
Learning to Share: Selective Memory for Efficient Parallel Agentic Systems |
|
|
• Studied how to share memory selectively across parallel agents to reduce redundancy and coordination overhead.
• Proposed a selective sharing mechanism/policy to decide what to broadcast vs keep private per agent.
• Evaluated on multi-agent settings, showing improved efficiency while maintaining (or improving) task performance.
|
| 2026-02-02 |
Live-Evo: Online Evolution of Agentic Memory from Continuous Feedback |
|
|
• Formulated memory as an evolving object updated online from dense/continuous feedback signals rather than sparse endpoints.
• Proposed an update/evolution loop that revises stored memories based on feedback to improve future behavior.
• Demonstrated online improvement over time in agent tasks under continuous supervision.
|
| 2026-02-02 |
Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation |
|
|
• Revisited agent memory retrieval as a two-stage process: decouple candidate fetching from evidence aggregation.
• Proposed an aggregation mechanism to combine multi-source/multi-hop evidence for downstream reasoning.
• Showed gains vs vanilla RAG-style retrieval pipelines in agent memory usage and answer quality.
|
| 2026-01-29 |
E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory |
|
|
• Proposed the E-mem framework, shifting from traditional memory preprocessing to Episodic Context Reconstruction to prevent information loss caused by de-contextualization.
• Adopted a heterogeneous Master-Assistant architecture where assistant agents maintain uncompressed context as memory nodes while the master agent handles global planning.
• Introduced a routing mechanism allowing assistants to reason within locally restored original contexts, achieving SOTA performance on LoCoMo and HotpotQA while reducing token costs by over 70%.
|
| 2026-01-29 |
ShardMemo: Masked MoE Routing for Sharded Agentic LLM Memory |
|
|
• Proposed ShardMemo, a tiered memory architecture: Tier A (working state), Tier B (sharded evidence), and Tier C (versioned skill library).
• Enforced a "scope-before-routing" strategy in Tier B and modeled shard selection as a Masked MoE routing problem under fixed budgets, using cost-aware gating.
• Improved F1 by +6.87 on LoCoMo and HotpotQA compared to cosine similarity routing, while reducing retrieval work and latency by 20.5%.
|
| 2026-01-28 |
MemCtrl: Using MLLMs as Active Memory Controllers on Embodied Agents |
|
|
• Proposed MemCtrl, a framework using MLLMs as active memory controllers to filter redundant observations online for embodied agents.
• Introduced a trainable memory head ($\mu$) acting as a gate to dynamically determine whether to retain, update, or discard observations during exploration.
• Trained via offline supervision and online RL, MemCtrl improved task completion rates by ~16% for small MLLMs on EmbodiedBench, with >20% gains on specific instruction subsets.
|
| 2026-01-28 |
AMA: Adaptive Memory via Multi-Agent Collaboration |
|
|
• Proposed AMA (Adaptive Memory via Multi-Agent Collaboration), leveraging Constructor, Retriever, Judge, and Refresher agents to manage multi-granularity memory.
• Utilized a hierarchical memory design (Raw Text, Fact Knowledge, Episode), where the Retriever dynamically routes queries and the Judge detects conflicts.
• The Refresher maintains long-term consistency via logic-driven updates. AMA significantly outperformed baselines on LoCoMo and LongMemEval while reducing token consumption by 80%.
|
| 2026-01-27 |
GLOVE: Global Verifier for LLM Memory-Environment Realignment |
|
|
• Proposed the Global Verifier (GLOVE) framework to address memory-environment misalignment caused by dynamic environmental drifts.
• Established "relative truth" via active probing to detect cognitive dissonance by comparing retrieved memories with fresh observations, realigning memory without ground truth.
• Significantly improved agent adaptability and success rates in web navigation, discrete planning, and continuous control tasks under explicit and implicit environment drifts.
|
| 2026-01-26 |
FadeMem: Biologically-Inspired Forgetting for Efficient Agent Memory |
|
|
• Proposed FadeMem, a memory architecture inspired by the Ebbinghaus forgetting curve, introducing active forgetting mechanisms to prevent information overload.
• Implemented a dual-layer hierarchy with adaptive exponential decay governed by semantic relevance, access frequency, and temporal patterns.
• Combined with LLM-guided conflict resolution, FadeMem achieved superior multi-hop reasoning on Multi-Session Chat and LoCoMo while reducing storage by 45%.
|
| 2026-01-26 |
MemWeaver: Weaving Hybrid Memories for Traceable Long-Horizon Agentic Reasoning |
|
|
• MemWeaver proposes a tri-layer memory framework (Graph, Experience, Passage) that consolidates long-term interactions into temporally grounded structures, ensuring evidence traceability.
• It employs a dual-channel retrieval strategy to "weave" structured relational facts with original textual evidence, supporting complex multi-hop and temporal reasoning tasks.
• Experiments on the LoCoMo benchmark show it improves reasoning accuracy while actively reducing input context length by over 95% compared to long-context baselines.
|
| 2026-01-24 |
Clustering-driven Memory Compression for On-device Large Language Models |
|
|
• Proposed a clustering-based memory compression strategy designed for on-device personalization under limited context windows.
• Groups similar memories and merges them within clusters (instead of simple concatenation), reducing redundancy while preserving semantic coherence.
• Demonstrated significant reduction in token usage and improved personalized generation quality compared to naive concatenation baselines.
|
| 2026-01-13 |
Chain-of-Memory: Lightweight Memory Construction with Dynamic Evolution for LLM Agents |
|
|
• Proposed CoM (Chain-of-Memory), advocating a shift from expensive structured construction to lightweight construction with dynamic utilization.
• Introduced Dynamic Memory Chain Evolution to organize retrieved fragments into coherent inference paths with adaptive truncation to prune noise.
• Achieved 7.5%–10.4% accuracy gains on LoCoMo and LongMemEval while reducing token consumption to ~2.7% compared to complex memory structures.
|
| 2026-01-15 |
TeleMem: Building Long-Term and Multimodal Memory for Agentic AI |
|
 |
• TeleMem introduces a unified long-term and multimodal memory framework that extracts narrative-grounded information to maintain coherent user profiles without schema-driven hallucinations.
• It employs a structured writing pipeline for batching, retrieval, and consolidation, significantly improving storage and token efficiency, and incorporates a multimodal memory module with ReAct-style reasoning for video understanding.
• Experimental results on the ZH-4O benchmark show TeleMem outperforming the state-of-the-art Mem0 baseline by 19% in accuracy while reducing token usage by 43% and speeding up operations by 2.1×.
|
| 2026-01-15 |
Grounding Agent Memory in Contextual Intent |
|
 |
• Proposes STITCH, an agentic memory system that indexes trajectory steps using "Contextual Intent"—comprising thematic scope, event type, and key entity types—to disambiguate recurring information in long-horizon tasks.
• Introduces a retrieval mechanism that filters and prioritizes memory snippets based on structural intent compatibility rather than just semantic similarity, effectively suppressing context-incompatible history.
• Presents CAME-Bench, a multi-domain benchmark designed to evaluate context-aware retrieval in realistic, goal-oriented trajectories, where STITCH achieves state-of-the-art performance.
|
| 2026-01-14 |
PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records |
|
|
• Introduces PersonalAlign, a new task requiring GUI agents to align with implicit user intents—specifically resolving vague instructions and anticipating routines—by leveraging long-term user records.
• Presents AndroidIntent, a benchmark constructed from 20k long-term records, featuring hierarchically annotated user preferences and routines to evaluate personalization capabilities.
• Proposes HIM-Agent (Hierarchical Intent Memory Agent), which utilizes a streaming aggregation module and hierarchical filters (Execution-based and State-based) to continuously update and organize user memory for improved reactive and proactive performance.
|
| 2026-01-13 |
AtomMem: Learnable Dynamic Agentic Memory with Atomic Memory Operation |
|
|
• Introduces AtomMem, a dynamic memory framework that reframes agentic memory management as a learnable sequential decision-making problem rather than a static, hand-crafted workflow.
• Deconstructs memory processes into atomic CRUD (Create, Read, Update, Delete) operations and employs reinforcement learning (GRPO) to learn a task-aligned policy for autonomously orchestrating these operations.
• Experimental results on long-context benchmarks (HotpotQA, 2WikiMultihopQA, Musique) demonstrate that AtomMem consistently outperforms static memory baselines by dynamically tailoring memory strategies to specific task demands.
|
| 2026-01-13 |
Fine-Mem: Fine-Grained Feedback Alignment for Long-Horizon Memory Management |
|
 |
• Fine-Mem is a unified reinforcement learning framework designed to optimize long-horizon memory management for LLM agents by aligning fine-grained feedback with memory operations.
• It addresses reward sparsity through Chunk-level Step Reward (CSR), which provides immediate supervision via constructed QA tasks, and solves credit assignment issues with Evidence-Anchored Reward Attribution (EARA) by linking global rewards to specific memory operations.
• Experimental results demonstrate that Fine-Mem consistently outperforms strong baselines on benchmarks like Memalpha and MemoryAgentBench, showing superior adaptability and generalization across different models.
|
| 2026-01-12 |
Active Context Compression: Autonomous Memory Management in LLM Agents |
|
|
• Proposed an autonomous strategy to compress/retain context so agents can operate under tight context budgets.
• Treated memory management as an active decision problem: what to keep, summarize, discard, or externalize.
• Demonstrated improved long-horizon performance vs passive truncation or naive summarization baselines.
|
| 2026-01-12 |
MemoBrain: Executive Memory as an Agentic Brain for Reasoning |
|
 |
• MemoBrain introduces an "executive memory" paradigm for tool-augmented agents, functioning as a co-pilot to construct dependency-aware memory and actively manage context under bounded budgets.
• The framework employs specific memory operations—Trajectory Folding and Selective Flush—to organize reasoning progress, retaining a high-salience structural backbone while discarding transient execution artifacts.
• Experiments on benchmarks like GAIA, WebWalker, and BrowseComp-Plus demonstrate that MemoBrain consistently outperforms strong baselines by enabling coherent, goal-directed reasoning over long horizons.
|
| 2026-01-12 |
Beyond Dialogue Time: Temporal Semantic Memory for Personalized LLM Agents |
|
 |
• TSM is a memory framework that models semantic time for point-wise memory and supports the construction and utilization of durative memory.
• It builds a semantic timeline to organize episodic interactions and consolidates them into time-aware durative memories (topics and personas) to capture long-term user states.
• During memory utilization, TSM incorporates the query’s temporal intent to retrieve temporally appropriate durative memories, significantly improving performance on benchmarks like LONGMEMEVAL and LOCOMO.
|
| 2026-01-10 |
Bi-Mem: Bidirectional Construction of Hierarchical Memory for Personalized LLMs via Inductive-Reflective Agents |
|
 |
• Bi-Mem is an agentic framework that constructs hierarchical memory (fact, scene, persona) bidirectionally using an inductive agent for bottom-up aggregation and a reflective agent for top-down calibration to mitigate noise and hallucination.
• It employs an associative retrieval mechanism that leverages spreading activation to connect memory units across granularities, enabling coherent recall of both contextual scenes and specific facts.
• Empirical evaluations on the LoCoMo benchmark demonstrate that Bi-Mem significantly outperforms leading memory baselines in long-term personalized conversational tasks.
|
| 2026-01-10 |
HiMem: Hierarchical Long-Term Memory for LLM Long-Horizon Agents |
|
 |
• HiMem is a hierarchical long-term memory framework designed for long-horizon dialogues, integrating fine-grained "Episode Memory" (via topic-aware segmentation) with abstract "Note Memory" (via knowledge extraction) to bridge concrete events and stable knowledge.
• It employs a conflict-aware "Memory Reconsolidation" mechanism that uses retrieval feedback to revise and supplement stored knowledge, enabling continual self-evolution and correction of memory over time.
• Evaluations on long-horizon benchmarks demonstrate that HiMem outperforms baselines in accuracy, consistency, and reasoning, validating the effectiveness of its hierarchical organization and dynamic updating strategies.
|
| 2026-01-10 |
Structured Episodic Event Memory |
|
 |
• SEEM introduces a dual-layer memory framework combining a Graph Memory Layer for static facts and an Episodic Memory Layer for narrative progression, both anchored by provenance pointers to raw interaction passages.
• The system employs a "Reverse Provenance Expansion" (RPE) mechanism to reconstruct coherent narrative contexts from fragmented evidence during retrieval, addressing the "scattered retrieval" problem in long-term interactions.
• Experiments on benchmarks like LoCoMo and LongMemEval show SEEM significantly outperforms competitive memory-augmented baselines (like HippoRAG 2) in narrative coherence and logical consistency.
|
| 2026-01-09 |
MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards |
|
 |
• MemBuilder is a reinforcement learning framework that trains LLMs to actively construct and manage a multi-dimensional memory system (Core, Episodic, Semantic, and Procedural) rather than relying on static prompting.
• It introduces "Attributed Dense Rewards Policy Optimization" (ADRPO) to solve reward sparsity and credit assignment issues by using synthetic session-level QA for immediate feedback and gradient weighting based on memory component contribution.
• Experimental results show that a lightweight 4B model trained with MemBuilder outperforms state-of-the-art closed-source models (including Claude 4.5 Sonnet) on long-term dialogue benchmarks like LoCoMo and LongMemEval.
|
| 2026-01-08 |
Beyond Static Summarization: Proactive Memory Extraction for LLM Agents |
|
|
• ProMem Framework: Addresses the limitations of "one-off" static summarization by proposing a proactive memory extraction framework inspired by Recurrent Processing Theory (RPT).
• Recurrent Feedback Loop: Introduces a self-questioning mechanism where the agent actively probes dialogue history to verify facts and recover missing details, ensuring memory completeness and accuracy.
• Performance: Outperforms state-of-the-art baselines (e.g., Mem0) on HaluMem and LongMemEval benchmarks, demonstrating high robustness in token compression and cost-effectiveness with Small Language Models.
|
| 2026-01-08 |
Memory Matters More: Event-Centric Memory as a Logic Map for Agent Searching and Reasoning |
|
|
• Proposed CompassMem, an event-centric memory framework inspired by Event Segmentation Theory, organizing memory as an Event Graph with explicit logical relations (causal, temporal).
• Transforms memory from passive storage into a Logic Map, enabling agents to actively navigate structured dependencies via a Planner-Explorer-Responder mechanism.
• Features active multi-path memory search that dynamically expands or skips nodes based on subgoal satisfaction, avoiding redundant retrieval.
• Demonstrates superior performance on LoCoMo and NarrativeQA benchmarks, significantly improving multi-hop and temporal reasoning compared to baselines like HippoRAG and Mem0.
|
| 2026-01-08 |
Inside Out: Evolving User-Centric Core Memory Trees for Long-Term Personalized Dialogue Systems |
|
|
• PersonaTree Framework: Introduces a globally maintained **PersonaTree** grounded in the Biopsychosocial model as a dynamic user profile. By constraining the trunk with a schema and iteratively updating branches, it enables controllable memory growth and compression.
• MemListener Training: Trains a lightweight MemListener model via **Reinforcement Learning with process-based rewards** to generate structured, executable memory operations (ADD, UPDATE, DELETE), achieving performance comparable to large reasoning models.
• Adaptive Response Generation: Implements a dual-mode strategy that utilizes PersonaTree directly for low-latency responses or triggers an agentic recall mode guided by the tree for complex queries, outperforming baselines in consistency and noise suppression.
|
| 2026-01-07 |
Membox: Weaving Topic Continuity into Long-Range Memory for LLM Agents |
|
|
• Membox Architecture: Addresses the "fragmentation-compensation" flaw in existing systems by proposing a hierarchical architecture centered on **Topic Continuity** to preserve temporal and causal flow.
• Topic Loom & Trace Weaver: Introduces a *Topic Loom* to group continuous dialogue into cohesive "memory boxes" and a *Trace Weaver* to link these boxes into long-range event timelines across discontinuities.
• Performance: Achieves up to 68% F1 improvement on temporal reasoning tasks in the LoCoMo benchmark compared to baselines like Mem0, while using significantly fewer context tokens.
|
| 2026-01-06 |
HiMeS: Hippocampus-inspired Memory System for Personalized AI Assistants |
|
 |
• HiMeS is a memory framework for AI assistants that emulates the hippocampus–neocortex interaction by integrating short-term dialogue compression with long-term user profile storage.
• It utilizes a short-term memory extractor trained via reinforcement learning to proactively pre-retrieve knowledge, and a partitioned long-term memory network to re-rank results based on historical user interactions.
• Evaluations on real-world industrial datasets demonstrate that HiMeS significantly outperforms traditional RAG baselines in personalized question-answering tasks.
|
| 2026-01-06 |
SYNAPSE: Empowering LLM Agents with Episodic-Semantic Memory via Spreading Activation |
|
|
• SYNAPSE is a brain-inspired memory architecture that replaces static vector retrieval with a unified episodic–semantic graph, addressing the “context isolation” issue in traditional RAG systems, where semantically distant yet causally related memories cannot be effectively associated.
• It introduces cognitive dynamics such as spreading activation, lateral inhibition, and temporal decay to dynamically propagate relevance and filter noise within the graph, rather than relying solely on precomputed links or vector similarity.
• SYNAPSE achieves state-of-the-art performance on the LoCoMo benchmark, significantly improving multi-hop reasoning capabilities and robustness to adversarial queries through an uncertainty-aware gating mechanism.
|
| 2026-01-06 |
CODEMEM: AST-Guided Adaptive Memory for Repository-Level Iterative Code Generation |
|
|
• Proposed CODEMEM, a memory management system tailored for repository-level iterative code generation.
• Introduces Code Context Memory: Uses AST-guided selection to dynamically update and merge repository context, keeping it relevant while discarding noise.
• Introduces Code Session Memory: Uses AST-based change analysis to detect conflicts and forgetting, organizing history into code-centric units (diffs) rather than just text.
• Achieves SOTA on CodeIF-Bench and CoderEval, improving instruction following by ~12% and reducing interaction rounds by 2–3.
|
| 2026-01-06 |
Implicit Graph, Explicit Retrieval: Towards Efficient and Interpretable Long-horizon Memory for Large Language Models |
|
|
• LatentGraphMem Framework: Proposes a memory framework combining implicit graph memory for stability and efficiency with explicit subgraph retrieval for interpretability, storing graph structures in latent space.
• Three-Stage Training Strategy: Involves training a graph builder (global representation), a subgraph retriever (budgeted edge selection), and joint fine-tuning (coordination optimization) for effective end-to-end QA.
• Performance: Consistently outperforms explicit-graph and latent-memory baselines on long-horizon benchmarks like HotpotQA, NarrativeQA, and WikiHop across multiple model scales, achieving up to 63.34% average accuracy.
|
| 2026-01-06 |
MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents |
|
|
• MAGMA Architecture: Proposes a multi-graph agentic memory architecture that explicitly models memory items across orthogonal semantic, temporal, causal, and entity graphs, overcoming the limitations of monolithic memory stores.
• Adaptive Topological Retrieval: Introduces an intent-aware Adaptive Traversal Policy that dynamically routes retrieval through relevant relational views, decoupling memory representation from retrieval logic for transparent reasoning.
• Performance: Consistently outperforms state-of-the-art agentic memory systems (e.g., Nemori, A-MEM) on long-horizon benchmarks like LoCoMo and LongMemEval, while reducing retrieval latency and token consumption.
|
| 2026-01-06 |
TiMem: Temporal-Hierarchical Memory Consolidation for Long-Horizon Conversational Agents |
|
|
• TiMem Framework: Introduces a temporal-hierarchical memory framework using a Temporal Memory Tree (TMT) to progressively consolidate raw dialog into abstract persona representations, emphasizing temporal continuity.
• Core Mechanisms: Features semantic-guided consolidation (fine-tuning free) and complexity-aware memory recall (Recall Planner + Gating) to balance precision and efficiency across query types.
• Performance: Achieves SOTA accuracy on LoCoMo (75.30%) and LongMemEval-S (76.88%) benchmarks, while significantly reducing recalled context length (-52.20%) on LoCoMo.
|
| 2026-01-06 |
MemRL: Self-Evolving Agents via Runtime Reinforcement Learning on Episodic Memory |
|
|
• MemRL Framework: Proposes a non-parametric reinforcement learning framework that enables frozen LLM agents to self-evolve by optimizing episodic memory, avoiding the costs and forgetting issues of fine-tuning.
• Intent-Experience-Utility Triplet: Introduces a Two-Phase Retrieval mechanism (semantic recall + value-aware selection) and a runtime utility update rule, using Q-values to distinguish high-utility strategies from noise.
• Performance: Significantly outperforms MemP and RAG on benchmarks like HLE, BigCodeBench, and ALFWorld, demonstrating that agents can continuously improve via runtime trial-and-error without weight updates.
|
| 2026-01-05 |
SimpleMem: Efficient Lifelong Memory for LLM Agents |
|
|
• Introduces SimpleMem, an efficient memory framework tailored for lifelong LLM agents based on semantic lossless compression.
• The system operates via a three-stage pipeline: Semantic Structured Compression to filter low-entropy noise, Recursive Memory Consolidation to synthesize abstract representations, and Adaptive Query-Aware Retrieval to minimize token usage.
• Experiments on the LoCoMo benchmark demonstrate a 26.4% improvement in F1 score and up to 30× reduction in inference token consumption compared to full-context models, significantly outperforming baselines like Mem0.
|
| 2026-01-05 |
Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents |
|
|
• AgeMem Framework: Proposes a unified framework that integrates both Long-Term (LTM) and Short-Term (STM) memory management directly into the agent's policy via tool-based actions (e.g., Add, Update, Filter).
• Three-Stage Progressive RL: Introduces a step-wise GRPO algorithm and a three-stage training strategy (LTM construction, STM control, integrated reasoning) to address sparse rewards and enable end-to-end optimization.
• Performance: Outperforms strong baselines like LangMem and Mem0 across five long-horizon benchmarks (e.g., ALFWorld, HotpotQA), achieving higher task success rates, better memory quality, and more efficient context usage.
|
| 2025-12-31 |
Nested Learning: The Illusion of Deep Learning Architecture |
|
|
• Full arXiv version including all appendices — not the previously released trimmed version.
• Presents a Nested Learning paradigm that unifies a large portion of optimizer and TTT-layer modules.
• Architectural innovation: HOPE — composed of modified Titans attention and self-modified FFNs. By controlling the chunksize of self-modification of FFN parameter, FFN layers operating at different frequencies implicitly retain memories at different hierarchical levels during runtime.
• Empirical results are modest.
|
| 2025-12-25 |
Beyond Heuristics: A Decision-Theoretic Framework for Agent Memory Management |
|
|
• Provides TeleAI background and introduces a decision-theoretic memory framework (DAM) that formulates the timing and content of memory read/write as an optimal decision problem, with relevance to RL-style formulations.
• Contains minimal or no experimental validation.
|
| 2025-12-21 |
MemEvolve: Meta-Evolution of Agent Memory Systems |
|
|
• An OPPO-affiliated paper proposes a two-layer framework that, in RL settings, separates learning to extract memories (level-1) from learning the memory-extraction method itself (level-2).
• Experiments using Flash-Searcher and GPT-5-Mini achieve SOTA on benchmarks including GAIA.
|
| 2025-12-20 |
MemR³: Memory Retrieval via Reflective Reasoning for LLM Agents |
|
|
• MemR³ closed-loop retrieval controller: designed for long-term conversational memory, it can dynamically choose among three actions—retrieve, reflect, and respond.
• Evidence–gap state tracker: the system maintains a global (evidence, gap) state that explicitly tracks "what is known" and "what is missing," making the process interpretable.
• Experiments show that on the LoCoMo benchmark, MemR³ significantly improves answer quality across different underlying memory systems (e.g., RAG, Zep).
|
| 2025-12-18 |
Learning Hierarchical Procedural Memory for LLM Agents through Bayesian Selection and Contrastive Refinement |
|
|
• A Bayesian procedural memory (experience) framework: MACLA.
• Overall still a rule-based algorithm; operations include extraction, retrieval/storage, and refinement (Bayesian posterior calibration).
• On unseen tasks in ALFWorld, performance (90.3%) exceeded that on seen tasks (87.2%), achieving +3.1% positive generalization.
|
| 2025-12-14 |
HINDSIGHT IS 20/20: BUILDING AGENT MEMORY THAT RETAINS, RECALLS, AND REFLECTS |
|
|
• HINDSIGHT is a unified memory architecture that treats memory as a structured, first-class substrate for reasoning, organizing information into four logical networks: world facts, agent experiences, synthesized entity summaries, and evolving beliefs.
• The system introduces TEMPR (Temporal Entity Memory Priming Retrieval) for building temporal entity graphs and CARA (Coherent Adaptive Reasoning Agents) for preference-conditioned reasoning, enabling agents to epistemically distinguish evidence from inference.
• Experimental results on LongMemEval and LoCoMo benchmarks demonstrate that HINDSIGHT significantly outperforms existing memory systems and full-context frontier models in multi-session consistency and open-domain question answering.
|
| 2025-12-11 |
Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution |
|
|
• ReMe (paper version): an Alibaba-affiliated framework for enhancing LLM procedural memory (experience), including the ReMe algorithm and the reme.library dataset.
• Core idea: maintain an experience pool with operations—Acquisition, Reuse, and Refinement.
• Experiments on BFCL-V3 and AppWorld show dynamic experience pools > static pools > baseline, with scaling studies for both model and judge models.
|
| 2025-12-10 |
LightSearcher: Efficient DeepSearch via Experiential Memory |
|
|
• LightSearcher is an efficient reinforcement learning (RL)-based search architecture grounded in experiential memory. During large language model (LLM)-driven reasoning, it autonomously optimizes agent tool invocation without relying on external data by transforming implicit reasoning trajectories into explicit experiential guidance through contrastive experiential memory.
• Evaluated on four multi-hop question answering benchmarks—Natural Questions (NQ), HotpotQA, MuSiQue, and 2WikiMultihopQA—LightSearcher achieves accuracy comparable to the state-of-the-art DeepSearch baseline while significantly reducing both tool invocation latency and model response time.
• The method reduces tool invocations by 39.6%, shortens reasoning time by 48.6%, and decreases token consumption by 21.2%, substantially improving tool-use efficiency without compromising task performance.
|
| 2025-12-3 |
MemVerse: Multimodal Memory for Lifelong Learning Agents |
|
|
• A lifelong learning memory framework for multimodal agents.
• Retrieval-based long-term memory + parameterized fast memory + periodic distillation.
• Multimodal handling: unified conversion into textual descriptions.
• Experiments show improvements over baselines on ScienceQA (text) and MSR-VTT (video); LoCoMo (text) results remain unpublished (in appendix).
|
| 2025-11-12 |
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning |
|
|
• Introduces ComoRAG, a retrieval-augmented generation framework inspired by the human Prefrontal Cortex, designed to achieve stateful reasoning in long narrative contexts.
• The framework employs a dynamic memory workspace and a metacognitive regulation loop (including Self-Probe, Mem-Fuse, and Mem-Update) to iteratively fuse fragmented evidence into coherent context.
• Experimental results demonstrate that ComoRAG consistently outperforms strong baselines on challenging benchmarks like NarrativeQA and ∞BENCH, particularly excelling in complex narrative queries requiring global understanding.
|
| 2025-11-04 |
MemSearcher Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning |
|
 |
• MemSearcher is a large language model (LLMs) agent trained through end-to-end Reinforcement Learning (RL), aiming to enhance the efficiency of knowledge acquisition tasks.
• MemSearcher optimizes memory management by adopting a new framework called multi-context Group Relative Strategy Optimization (Multi-Context GRPO), which enables the model to self-evolve in multiple conversations.
• Compared with traditional ReAct search agents, MemSearcher offers significant performance improvements while maintaining low token consumption, especially on smaller models.
|
| 2025-10-15 |
D-SMART: Enhancing LLM Dialogue Consistency via Dynamic Structured Memory And Reasoning Tree |
|
|
• Proposes D-SMART, a model-agnostic framework designed to maintain logical and factual consistency in multi-turn dialogues by coupling a Dynamic Structured Memory (DSM) with a Reasoning Tree (RT).
• DSM incrementally builds an OWL-compliant knowledge graph from conversation history to prevent context decay, while RT guides the LLM through explicit, traceable multi-step reasoning over this graph.
• Comprehensive experiments on MT-Bench-101 demonstrate that D-SMART significantly outperforms state-of-the-art baselines, improving consistency scores by over 48% and exhibiting strong stability in extended dialogues.
|
| 2025-10-14 |
Memory as Action Autonomous Context Curation for Long-Horizon Agentic Tasks |
|
 |
• Memory-as-action (MemAct) addresses the issue of working Memory management for large language models (LLMS) in long-duration tasks.
• MemAct transforms memory management into a learnable intrinsic capability, enabling agents to dynamically manage memories while performing tasks, and introduces the Dynamic Context Policy Optimization (DCPO) algorithm to handle the trajectory breakage problem caused by memory editing.
• MemAct performs exceptionally well in multi-objective question answering tasks, demonstrating higher accuracy and robustness than traditional models.
|
| 2025-10-12 |
MemGen Weaving Generative Latent Memory for Self-Evolving Agents |
|
 |
• MemGen is a dynamic generative memory framework designed to enhance the reasoning and decision-making capabilities of agents based on large language models (LLMS).
• MemGen simulates human cognitive patterns by interweaving memory with the reasoning process.
• This framework consists of two parts: memory triggers and memory weavers, which can dynamically determine when to invoke potential memories and integrate them into the reasoning process.
|
| 2025-10-10 |
How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior |
|
 |
• The paper investigates memory management in large language model (LLM) agents and its impact on long-term performance.
• It identifies issues such as error propagation and misaligned experience replay, highlighting the importance of high-quality memory.
• By comparing multiple memory insertion and deletion strategies, the study finds that selective insertion performs better for long-term learning, while historical deletion is particularly effective at reducing low-quality memory records.
|
| 2025-10-09 |
Enabling Personalized Long-term Interactions in LLM-based Agents through Persistent Memory and User Profiles |
|
 |
• Introduces a framework for adaptive, user-centered AI agents that combines persistent memory, dynamic coordination, and evolving user profiles to enable personalized long-term interactions.
• The approach integrates established agentic AI patterns—such as Multi-Agent Collaboration and Multi-Source Retrieval—with mechanisms like self-validation and implicit user profiling to tailor responses to individual needs.
• Evaluations on three public datasets and a pilot user study demonstrate improvements in retrieval accuracy, response correctness, and perceived personalization compared to standard RAG baselines.
|
| 2025-10-08 |
ToolMem: Enhancing Multimodal Agents with Learnable Tool Capability Memory |
|
 |
• TOOLMEM, a memory-augmented agent that learns from past tool use. It stores summarized, retrievable “what this tool is good/bad at” knowledge and injects the relevant memories into context to better predict tool quality and choose the right tool for new tasks.
• TOOLMEM maintains structured capability entries per tool. From each experience, it retrieves similar memories and updates them via a RAG-style merge/refinement, keeping a compact, evolving capability memory. At inference time, it retrieves the most relevant capability memories to guide scoring and tool selection.
• They evaluate on text generation tools and text-to-image tools, comparing against no-memory and few-shot baselines. TOOLMEM improves quality prediction and makes better tool choices overall.
|
| 2025-10-07 |
CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension |
|
|
• The paper introduces CAM, a Constructivist Agentic Memory system inspired by Jean Piaget’s theory, designed to enhance Large Language Models (LLMs) in long-form document comprehension.
• CAM features structured schemata, flexible assimilation, and dynamic accommodation, utilizing an incremental overlapping clustering algorithm for efficient memory development and an adaptive Prune-and-Grow strategy for retrieval.
• Experimental results across diverse benchmarks show that CAM achieves dual advantages in both performance and efficiency compared to existing structured and unstructured memory approaches.
|
| 2025-09-30 |
MEM-α: LEARNING MEMORY CONSTRUCTION VIA REINFORCEMENT LEARNING |
|
|
• Proposes Mem-α, a reinforcement learning framework that trains agents to effectively manage complex memory systems (comprising core, episodic, and semantic components) through interaction and feedback.
• Unlike approaches relying on pre-defined instructions, Mem-α treats memory construction as a sequential decision-making problem, optimizing directly for downstream question-answering accuracy.
• Experimental results show that Mem-α significantly outperforms existing baselines and demonstrates remarkable generalization, effectively handling contexts exceeding 400k tokens despite being trained on 30k token sequences.
|
| 2025-09-29 |
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory |
|
|
• ReasoningBank, a test-time learning framework that distills an agent’s own successful and failed trajectories into reusable reasoning memories. For new tasks, the agent retrieves relevant memories to guide decision-making and then writes new experience back into the bank, forming a self-improving loop without requiring ground-truth feedback.
• Each memory is stored as a compact structured item and retrieved via embedding similarity (top-k) to augment the agent’s prompt. After task execution, an LLM-as-a-judge provides proxy success/failure signals: successful trajectories yield transferable strategies, while failed ones yield pitfalls and corrective rules. In addition, MaTTS expands test-time computation through parallel trajectory sampling and serial self-reflection, both of which generate stronger memory signals.
• Experiments are conducted on WebArena and Mind2Web and SWE-Bench-Verified, comparing against No Memory and prior memory-based baselines. Performance is evaluated using success rate, efficiency (steps), and task-specific metrics. Results show consistent improvements across different backbone models
|
| 2025-09-29 |
Pretraining with hierarchical memories: separating long-tail and common knowledge |
|
|
• Proposes a "pretraining-with-memories" architecture that decouples reasoning capabilities (anchor model) from long-tail world knowledge (hierarchical memory bank).
• The system dynamically retrieves and attaches context-dependent parameter blocks from a massive memory bank to a small anchor model during inference, enabling efficient scaling.
• Experiments demonstrate that a 160M model augmented with memories matches the performance of a standard model with over twice the parameters, specifically excelling at long-tail knowledge tasks.
|
| 2025-09-27 |
Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents |
|
|
• Addressed long-context reasoning where relevant evidence is dispersed across very long inputs.
• Proposed a “revisitable” memory design that allows the agent to look back and selectively retrieve from the broader history.
• Evaluated on long-context QA settings to show improved evidence recovery and reasoning accuracy.
|
| 2025-09-26 |
Conflict-Aware Soft Prompting for Retrieval-Augmented Generation |
|
|
• The "Conflict-Aware Retrieval Enhancement Generation" (CARE) model aims to address the context-memory conflict problem that occurs in Retrieval Enhancement Generation (RAG).
• CARE optimizes the performance of large language models (LLMs) by introducing context evaluators, especially in dealing with conflicts between external and internal knowledge.
• This method significantly enhances the accuracy and reliability of the model in multiple tasks through techniques such as conflict-aware fine-tuning, soft prompts, and adversarial soft prompts.
|
| 2025-09-26 |
PRIME Planning and Retrieval-Integrated Memory for Enhanced Reasoning |
|
|
• PRIME is a multi-agent inference framework. PRIME provides intuitive answers to simple questions through fast-response agents.
• PRIME performs complex reasoning through multiple specific agents, such as memory, planning, search and reading agents.
• PRIME still needs to improve its belief correction mechanism and optimize the interaction among agents.
|
| 2025-09-25 |
SGMEM: Sentence Graph Memory for Long-Term Conversational Agents |
|
|
• SGMem is a hierarchical memory management framework designed to address memory fragmentation in long-term conversational agents by organizing dialogue into sentence-level graphs.
• It explicitly models associations across turns, rounds, and sessions, and uses a multi-hop retrieval mechanism to integrate raw dialogue history with generated memory such as summaries, facts, and insights.
• Extensive experiments on LongMemEval and LoCoMo benchmarks demonstrate that SGMem consistently improves retrieval coherence and outperforms strong baselines in question answering accuracy.
|
| 2025-09-22 |
PRINCIPLES: Synthetic Strategy Memory for Proactive Dialogue Agents |
|
|
• PRINCIPLES builds a retrievable memory of dialogue strategy principles from offline self-play. At inference time, the model retrieves and applies these principles to guide strategy selection and response generation, without any additional training.
• In the offline stage, the agent conducts multi-turn self-play with a user simulator and uses rewards to identify success or failure. Successful cases directly yield principles, while failed cases trigger strategy revision and rollback until success; principles are then extracted by contrasting failure-to-success trajectories in a structured form. In the online stage, relevant principles are retrieved using contextual embeddings, reinterpreted to fit the current dialogue, and then used to guide planning and response generation.
• Experiments on emotional support and persuasion tasks show that PRINCIPLES improves success rates and strategy prediction performance while increasing strategy diversity. Ablation studies confirm the importance of retrieval and reinterpretation, and human evaluations indicate overall preference for the proposed method.
|
| 2025-09-16 |
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research |
|
|
• Introduces WebWeaver, a dual-agent framework comprising a Planner and a Writer designed to tackle open-ended deep research (OEDR) by emulating human research processes.
• The Planner uses a dynamic cycle to interleave evidence acquisition with outline optimization, building a memory bank of evidence; the Writer performs hierarchical, citation-grounded retrieval to compose the report section by section.
• WebWeaver achieves state-of-the-art performance on benchmarks like DeepResearch Bench by effectively managing long contexts and mitigating hallucinations through targeted memory retrieval.
|
| 2025-09-15 |
MOOM: Maintenance, Organization and Optimization of Memory in Ultra-Long Role-Playing Dialogues |
|
|
• MOOM is a dual-branch memory extraction framework designed for ultra-long role-playing dialogues, modeling "plot development" and "character portrayal" as core storytelling elements.
• It incorporates a novel forgetting mechanism based on "competition-inhibition" theory to effectively control memory capacity and prevent uncontrolled expansion.
• The authors introduce ZH-4O, a large-scale Chinese role-playing dataset with average 600-turn dialogues and manual memory annotations, demonstrating MOOM's superior performance over state-of-the-art methods.
|
| 2025-09-13 |
Pre-Storage Reasoning for Episodic Memory: Shifting Inference Burden to Memory for Personalized Dialogue |
|
|
• PREMem (Pre-storage Reasoning for Episodic Memory) is a novel approach that shifts complex reasoning processes from response generation to the memory construction phase.
• It extracts fine-grained memory fragments (categorized into factual, experiential, and subjective information) and establishes explicit cross-session relationships based on cognitive schema theory, capturing evolution patterns like extensions and transformations.
• Experiments on LongMemEval and LoCoMo benchmarks show significant performance improvements, enabling smaller models to achieve results comparable to larger baselines while reducing inference computational demands.
|
| 2025-09-11 |
OpenUnlearning:Accelerating LLM unlearning via unified benchmarking of methods and metrics |
|
 |
• Introduces the “OpenUnlearning” framework, designed to advance research on unlearning in large language models (LLMs).
• OpenUnlearning integrates a wide range of unlearning algorithms and evaluation methods, streamlining the research workflow for studying forgetting.
• Through targeted and task-specific evaluations, OpenUnlearning ensures the credibility and robustness of unlearning assessment standards.
|
| 2025-08-27 |
Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning |
|
|
• Memory-R1 is an RL-driven framework that empowers LLMs to actively manage and utilize external memory via two specialized agents: a Memory Manager and an Answer Agent.
• The Memory Manager learns structured operations (ADD, UPDATE, DELETE) to maintain memory, while the Answer Agent filters retrieved memories for accurate reasoning.
• With only 152 training samples, it outperforms strong baselines on LoCoMo, MSC, and LongMemEval, demonstrating high data efficiency and generalization.
|
| 2025-08-26 |
MemoryVLA Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation |
|
 |
• MemoryVLA is a newly developed robot operation framework, aiming to enhance the performance of robots in complex tasks by integrating visual, language, and perception-cognitive mechanisms.
• This framework adopts an architecture similar to the human dual memory system, enhancing the robot's ability to handle long-sequence tasks.
• MemoryVLA introduces perception-cognitive memory banks (PCMB), which can effectively integrate historical information with current decisions, thereby enhancing the success rate of robots in responding to complex scenarios.
|
| 2025-08-22 |
Memento: Fine-tuning LLM Agents without Fine-tuning LLMs |
|
|
• Proposed a paradigm to improve agent behavior via “agent-side” learning while keeping the base LLM frozen.
• Focused on adapting the agent’s components (e.g., memory/reasoning/routing) rather than model weights.
• Reported performance gains across agent tasks without conventional LLM fine-tuning.
|
| 2025-08-21 |
Multiple Memory Systems for Enhancing the Long-term Memory of Agent |
|
|
• Proposes a Multiple Memory System (MMS) inspired by cognitive psychology to address the issue of low-quality memory content in existing agent memory modules.
• The system processes short-term memory into diverse fragments—keywords, cognitive perspectives, episodic memory, and semantic memory—to construct specialized retrieval and contextual memory units.
• Experimental results on the LoCoMo dataset demonstrate that MMS significantly outperforms methods like MemoryBank and A-MEM, particularly in multi-hop reasoning and open-domain tasks.
|
| 2025-08-18 |
Semantic Anchoring in Agentic Memory: Leveraging Linguistic Structures for Persistent Conversational Context |
|
 |
• Semantic Anchoring is a hybrid agentic memory architecture designed to enhance the long-term context retention of LLMs by enriching vector-based storage with explicit linguistic cues such as syntactic dependencies, discourse relations, and coreference links.
• The proposed framework employs a multi-stage pipeline involving dependency parsing, coreference resolution, and discourse tagging to construct a hybrid index, allowing retrieval systems to access memories based on both semantic similarity and structural linguistic roles.
• Experimental results on adapted long-term dialogue datasets (MultiWOZ-Long and DialogRE-L) demonstrate that Semantic Anchoring outperforms strong RAG baselines, improving factual recall and discourse coherence by up to 18% while maintaining higher user satisfaction.
|
| 2025-08-13 |
Memp: Exploring Agent Procedural Memory |
|
 |
• Memp treats procedural memory as an external, learnable store of past successful experiences so an LLM agent can reuse effective “how-to” routines on new tasks, improving success and reducing wasted steps.
• Memp follows a Build–Retrieve–Update loop: it builds memory items from trajectories/scripts, retrieves the most relevant items via semantic keys and vector similarity, and updates memory online by adding, filtering, and correcting items so the memory becomes more reliable over time.
• On TravelPlanner and ALFWorld, Memp outperforms a ReAct baseline with higher success/score and fewer steps; vector-based retrieval beats random selection; online updates yield further gains, and learned memories can transfer from stronger to weaker models with diminishing returns as retrieval size grows.
|
| 2025-08-12 |
Context as Memory Scene-Consistent Interactive Long Video Generation with Memory Retrieval |
|
 |
• "Context-as-memory" significantly enhances the scene consistency and Memory capacity of long video generation by leveraging historical Context as memory.
• The paper studies key designs such as context learning mechanisms, camera control, and memory retrieval strategies, and points out the balance between computational efficiency and generation quality.
• Based on the long video generation architecture of the diffusion model, the current technological progress, challenges and future directions are expounded.
|
| 2025-08-12 |
Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory |
|
|
• Introduces Intrinsic Memory Agents, a multi-agent framework designed to address context limitations and role inconsistency using structured, agent-specific memories.
• The method employs role-aligned memory templates and intrinsic updates derived directly from agent outputs, preserving heterogeneous perspectives and domain expertise without external summarization.
• Evaluations on the PDDL benchmark demonstrate a 38.6% performance improvement with high token efficiency, while case studies show enhanced quality in complex planning tasks.
|
| 2025-08-06 |
RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory |
|
|
• RCR-Router is a role-aware context routing framework designed for multi-agent LLM systems to address the limitations of static and full-context routing, such as excessive token consumption and redundant memory exposure.
• The framework dynamically selects semantically relevant memory subsets for each agent based on their specific role and the current task stage, enforcing a strict token budget and utilizing an iterative feedback mechanism to refine context.
• Experiments on multi-hop QA benchmarks (HotPotQA, MuSiQue, 2WikiMultihop) demonstrate that RCR-Router reduces token usage by 25–47% while maintaining or improving answer quality compared to baseline strategies.
|
| 2025-08-03 |
MLP Memory: A Retriever-Pretrained Memory for Large Language Models |
|
|
• Introduces MLP Memory, a lightweight parametric module that learns to internalize retrieval patterns without requiring explicit document access during inference, effectively bridging the gap between RAG and parametric fine-tuning.
• By pretraining an MLP to imitate a kNN retriever’s behavior on the entire pretraining dataset, the model compresses large datastores into a differentiable memory component that integrates with Transformer decoders via probability interpolation.
• Experimental results show that MLP Memory achieves superior scaling behavior, improves QA performance by 12.3% relative to baselines, reduces hallucinations by up to 10 points, and offers 2.5× faster inference than RAG.
|
| 2025-07-29 |
SynapticRAG:Enhancing temporal memory retrieval in large language models through synaptic mechanisms |
|
|
• The paper proposes MemTool, a short-term memory framework for managing dynamic tool sets across multi-turn conversations. It offers three architectures: Autonomous Agent, Workflow, and Hybrid, balancing autonomy and control.
• In Autonomous Mode, the agent autonomously adds/removes tools using Search_Tools and Remove_Tools. Workflow Mode follows a fixed pipeline: pruning tools, then searching and adding new ones. Hybrid Mode separates tool removal and adding, offering a balance of stability and flexibility.
• Using ScaleMCP’s 5,000 MCP servers and a 100-turn dialogue, the authors evaluate 13 LLMs with a 128-tool limit. Autonomous Mode achieves 90-94% tool removal efficiency, while Workflow and Hybrid perform consistently well, with Autonomous and Hybrid excelling in task completion.
|
| 2025-07-27 |
SynapticRAG:Enhancing temporal memory retrieval in large language models through synaptic mechanisms |
|
|
• SynapticRAG is a novel memory retrieval framework for large language models (LLMs), designed to enhance memory retrieval in cross-session conversations.
• By combining temporal association triggers with biologically inspired synaptic propagation mechanisms, SynapticRAG significantly improves the identification of relevant conversational history.
• Experimental results show that the framework achieves improvements of up to 14.66% across multiple performance metrics and demonstrates clear advantages in dynamic memory management.
|
| 2025-07-17 |
MEM1 Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents |
|
 |
• MEM1 is an innovative end-to-end reinforcement learning framework designed to enhance the efficiency of large language models (LLMs) in long-term multi-round interactions.
• MEM1 effectively solves the problem of memory dilation in context processing of traditional models by constructing a compact shared internal state.
• The experimental results show that MEM1 significantly improves performance in multiple tasks while reducing memory usage, demonstrating its wide applicability and optimization potential in dynamic environments.
|
| 2025-07-03 |
MemAgent Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent |
|
|
• MemAgent is a long text processing method that uses reinforcement learning (RL) to dynamically update memory, aiming to address the performance degradation and high computational complexity issues of large language models (LLMS) when dealing with long texts.
• The model can maintain a linear time complexity while handling inputs of infinite length by treating memory as a latent variable and introducing stream processing and multi-session strategies.
• The experimental results show that MemAgent performs outstandingly with high accuracy in ultra-long text tasks, especially having obvious advantages in complex multi-hop reasoning tasks.
|
| 2025-06-19 |
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models |
|
|
• The paper proposes HippoRAG 2, a “long-term memory–inspired” structured RAG system. It builds a knowledge graph from text and retrieves evidence via graph-based propagation (PPR) to support multi-hop association, while improving basic factual recall that earlier structured RAGs often hurt.
• Offline, an LLM performs OpenIE to extract triples and form a KG, and adds passages as nodes linked to phrase nodes to fuse concept-level structure with context-rich passages. Online, it first retrieves top-k triples with embeddings, then uses an LLM for triple filtering to remove irrelevant triples; the remaining nodes seed a PPR run to rank the most relevant passages for the generator.
• It evaluates factual QA, multi-hop reasoning, and narrative understanding, reporting Recall@5 for retrieval and F1 for QA. Compared with BM25, dense retrievers, and multiple structured-RAG baselines, HippoRAG 2 generally improves retrieval and end-to-end QA, and ablations plus “growing-corpus” settings support the contribution of its components.
|
| 2025-06-09 |
G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems |
|
|
• Introduces G-Memory, a hierarchical memory system designed to address the lack of self-evolution capabilities in Large Language Model (LLM)-based Multi-Agent Systems (MAS).
• Implements a three-tier graph architecture—Insight Graph, Query Graph, and Interaction Graph—to manage lengthy interaction histories by abstracting generalizable insights and condensing specific collaborative trajectories.
• Experimental results across embodied action and knowledge QA benchmarks demonstrate that G-Memory significantly enhances agent team performance, improving success rates by up to 20.89% without modifying the original frameworks.
|
| 2025-05-30 |
M+:Extending MemoryLLM with scalable Long-Term Memory |
|
|
• M+ is a memory-augmented model designed to improve long-term information retention in large language models (LLMs).
• Built upon MemoryLLM, M+ integrates long-term memory mechanisms with a jointly trained retriever, substantially enhancing the model’s ability to handle knowledge spanning over 20,000 tokens while maintaining comparable GPU memory overhead.
• M+ achieves strong performance across multiple benchmarks, outperforming MemoryLLM and other competitive baselines, and demonstrates efficient information compression and end-to-end training, exhibiting mechanisms that closely resemble human memory.
|
| 2025-05-26 |
MemGuide: Intent-Driven Memory Selection for Goal-Oriented Multi-Session LLM Agents |
|
|
• MemGuide is a two-stage framework designed to enhance multi-session task-oriented dialogue (TOD) by incorporating task intent and slot-level guidance into memory selection.
• It employs Intent-Aligned Retrieval to match current context with stored intent descriptions and Missing-Slot Guided Filtering to prioritize memory units that fill information gaps using a Chain-of-Thought reasoner.
• The authors also introduce MS-TOD, a multi-session TOD benchmark. Evaluations show MemGuide significantly improves task success rates and reduces dialogue turns compared to strong baselines.
|
| 2025-05-23 |
Towards General Continuous Memory for Vision-Language Models |
|
 |
• CoMEM addresses the token overload and performance degradation issues in traditional Retrieval-Augmented Generation (RAG) for Vision-Language Models (VLMs) by introducing a general continuous memory mechanism.
• The method innovatively utilizes the VLM itself as a memory encoder combined with a lightweight Q-Former, efficiently compressing diverse multimodal and multilingual knowledge into a compact set of continuous embeddings.
• CoMEM is data- and parameter-efficient (requiring only 1.2% trainable parameters) and plug-and-play, significantly enhancing performance on complex multimodal reasoning tasks while keeping the inference model frozen.
|
| 2025-05-21 |
Pre-training Limited Memory Language Models with Internal and External Knowledge |
|
|
• Introduces Limited Memory Language Models (LMLM), a new class of models that externalizes factual knowledge to an external database during pre-training rather than encoding it in parameters.
• The approach uses a modified pre-training objective that masks retrieved factual values from the loss, encouraging the model to perform targeted lookups for facts instead of memorizing them.
• Experiments demonstrate that LMLMs match the factual precision of significantly larger models while enabling instant, verifiable knowledge updates and effective machine unlearning through simple database operations.
|
| 2025-05-11 |
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents |
|
|
• Proposes Reflective Memory Management (RMM), a novel framework for long-term dialogue agents that addresses the limitations of rigid memory granularity and fixed retrieval mechanisms.
• Integrates Prospective Reflection to dynamically organize dialogue history into topic-based memories, and Retrospective Reflection to iteratively refine retrieval using online reinforcement learning guided by LLM attribution signals.
• Experimental results on MSC and LongMemEval benchmarks demonstrate that RMM significantly outperforms strong baselines, achieving over 10% improvement in accuracy and enhancing response personalization.
|
| 2025-04-22 |
MemoRAG Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation |
|
|
• MemoRAG aims to enhance the ability of large language models (LLMs) in handling long contexts by improving the information retrieval and generation process through a global memory-enhanced retrieval mechanism.
• This framework adopts a lightweight global memory module and a complex generation system, which can effectively manage long contexts and generate useful clues to assist in answer generation.
• This model is applicable to a variety of tasks, including long document question answering and summarization, demonstrating its potential in handling complex long text scenarios.
|
| 2025-04-20 |
SAGE: Self-evolving Agents with Reflective and Memory-augmented Abilities |
|
|
• SAGE addresses the long-term memory and multitasking challenges of large language models (LLMs) in dynamic environments through three collaborative agents. SAGE integrates a reflection mechanism and memory optimization based on the Ebbinghaus forgetting curve, helping the model effectively filter and store important information while reducing cognitive load.
• SAGE continuously optimizes the Assistant’s decisions through an iterative feedback mechanism and reflection functionality. Its MemorySyntax component simulates human memory decay, dynamically managing both short-term and long-term memory to ensure the retention of critical information while reducing unnecessary memory burden.
• Experiments show that SAGE significantly improves model performance on AgentBench and long-text tasks (e.g., HotpotQA), with performance improvements up to 2.26x in multi-hop question answering and code generation tasks, and it effectively resolves 73.6% of ambiguous references in dialog tasks, demonstrating its potential in real-world applications.
|
| 2025-04-14 |
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning |
|
|
• Introduces ComoRAG, a retrieval-augmented generation framework inspired by the human Prefrontal Cortex, designed to achieve stateful reasoning in long narrative contexts.
• The framework employs a dynamic memory workspace and a metacognitive regulation loop (including Self-Probe, Mem-Fuse, and Mem-Update) to iteratively fuse fragmented evidence into coherent context.
• Experimental results demonstrate that ComoRAG consistently outperforms strong baselines on challenging benchmarks like NarrativeQA and ∞BENCH, particularly excelling in complex narrative queries requiring global understanding.
|
| 2025-04-10 |
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory |
|
|
• The paper proposes Dynamic Cheatsheet (DC)—a continuously updated “sticky-note” external memory for black-box LLMs at inference time. It distills verified solving patterns and reuses them across problems, enabling test-time learning without training.
• DC consists of a Generator (Gen) and a Memory Curator (Cur): Gen produces an answer using the current memory, and Cur then refines/filters/compresses the information. A retrieval-based variant selects the most relevant past examples and solutions by similarity to assist generation, while preventing memory bloat.
• DC is evaluated across multiple tasks and models (e.g., AIME, GPQA-Diamond, Game of 24, MMLU-Pro; GPT-4o, Claude 3.5 Sonnet, etc.) using metrics such as Soft Match and Functionally Correct. Results show substantial gains; for example, the jump on Game of 24 is largely driven by reusable Python solver code being repeatedly “written and reused” in memory.
|
| 2025-03-27 |
MemInsight: Autonomous Memory Augmentation for LLM Agents |
|
|
• Proposed an autonomous memory augmentation pipeline to improve how agents store and later retrieve historical interactions.
• Emphasized filtering/structuring memory to keep salient information and reduce irrelevant recall.
• Validated on multiple agent scenarios (e.g., QA / recommendation / summarization) to show improved contextual responses.
|
| 2025-03-07 |
Memory-augmented Query Reconstruction for LLM-based Knowledge Graph Reasoning |
|
|
• MemQ is proposed to decouple reasoning (natural-language steps) from query generation/execution (SPARQL): the LLM produces a clear reasoning plan, while the actual query is obtained via memory retrieval + rule-based reconstruction, reducing errors and hallucinations caused by entangling tool calls with reasoning.
• During training, gold SPARQL queries are decomposed into query fragments, and a natural-language explanation is generated for each fragment to build a query memory bank of (explanation → fragment) pairs. At inference time, the LLM generates step-by-step plans; reconstruction uses semantic retrieval (Sentence-BERT) to fetch an adaptive Top-N set of fragments, then assembles them with rules and fills entity slots to produce the final executable query.
• Experiments on WebQSP and CWQ use Hits@1 and F1, where MemQ achieves the best overall performance. Additional analyses with structural consistency / edge hit rate show the reconstructed queries are closer to the gold graphs, and ablation studies confirm that the main gains come from the memory bank + decoupling design.
|
| 2025-02-25 |
Towards effective evaluation and comparisons for LLM unlearning methods |
|
|
• The paper examines machine unlearning in large language models (LLMs) and the importance of its evaluation, with a particular focus on removing undesirable or unnecessary data memories.
• It introduces Unlearning with Calibration (UWC) to calibrate model performance and strengthen the evaluation of different unlearning methods.
• The study emphasizes the importance of selecting appropriate evaluation metrics and recommends Extraction Strength (ES) as a primary evaluation tool to ensure accuracy and robustness in assessment.
|
| 2025-02-09 |
LM2 Large Memory Models |
|
|
• LM2 aims to overcome the limitations of traditional Transformers in multi-step reasoning, relational argumentation, and long context processing.
• The LM2 integrates an auxiliary memory module, which utilizes cross-attention mechanisms and gating technology to enhance information storage and update capabilities.
• In multiple benchmark tests, LM2 has demonstrated significantly superior performance, particularly excelling in long context reasoning tasks, effectively enhancing the ability to process and remember complex information.
|
| 2025-02-03 |
TReMu: Towards Neuro-Symbolic Temporal Reasoning for LLM-Agents with Memory in Multi-Session Dialogues |
|
|
• Introduced an evaluation task/benchmark targeting temporal reasoning over noisy, multi-session dialogues with memory.
• Proposed a neuro-symbolic framework (TReMu) to improve temporal reasoning using memory-aware representations.
• Constructed multi-choice QA style evaluations (augmented from existing dialogue sources) and reported improved performance.
|
| 2025-01-23 |
ON MEMORY CONSTRUCTION AND RETRIEVAL FOR PERSONALIZED CONVERSATIONAL AGENTS |
|
|
• Introduces SECOM, a memory management method that constructs memory banks at the segment level to address limitations of turn-level and session-level approaches in long-term conversations.
• SECOM partitions conversations into topically coherent segments and employs prompt compression (LLMLingua-2) as a denoising mechanism to enhance retrieval accuracy.
• Experimental results demonstrate that SECOM significantly outperforms existing baselines on long-term conversation benchmarks like LOCOMO and Long-MT-Bench+.
|
| 2025-01-19 |
Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models |
|
|
• Proposes Alternate Preference Optimization (AltPO), a method designed to effectively address the challenges of machine unlearning in large language models (LLMs).
• AltPO enhances unlearning by combining negative feedback from the forget set with positive feedback from the same domain to generate multiple alternative responses, thereby improving forgetting capability while preserving overall model performance.
• Experimental results demonstrate that AltPO outperforms existing methods in terms of both unlearning quality and model utility.
|
| 2024-12-31 |
Titans Learning to Memorize at Test Time |
|
|
• "Titans" aims to enhance the model's memory capacity when dealing with long sequences and complex contexts.
• The Titans architecture combines short-term memory and long-term memory modules, overcomes the limitations of traditional recursive models and attention mechanisms, and is capable of handling larger context Windows.
• The experimental results show that Titans exhibit superior performance and flexibility, especially in handling long dependency relationships and diverse tasks.
|
| 2024-12-17 |
On the Structural Memory of LLM Agents |
|
|
• The paper investigates how the structure and retrieval methods of memory modules in large language models (LLMs) affect model performance, with a focus on different memory architectures and their roles in information extraction and generation.
• The study finds that hybrid memory structures outperform others in complex tasks, demonstrating greater robustness in noisy environments.
• Through hyperparameter sensitivity analysis, the research identifies memory retrieval strategies that are best suited to different task settings.
|
| 2024-12-01 |
SELF-UPDATABLE LARGE LANGUAGE MODELS BY INTEGRATING CONTEXT INTO MODEL PARAMETERS |
|
|
• Proposes SELF-PARAM, a method to integrate contexts directly into LLM parameters without requiring extra storage modules, ensuring both high efficacy and long-term retention.
• Employs a training objective that minimizes the KL divergence between an original model (with context access) and a target model (without context), utilizing diverse generated QA pairs.
• Experiments demonstrate that SELF-PARAM significantly outperforms existing continual learning and RAG methods in question-answering and conversational recommendation tasks, achieving near-optimal performance with zero storage complexity.
|
| 2024-10-10 |
Assessing episodic memory in LLMs with sequence order recall tasks |
|
|
• This study introduces the Sequence Order Recall Task (SORT), designed to evaluate the episodic memory capabilities of large language models (LLMs).
• The task highlights the importance of episodic memory—linking memories with relevant context such as time and location—particularly in everyday cognitive tasks.
• Preliminary results indicate that LLMs exhibit strong memory performance when contextual information is provided, but their performance degrades significantly when relying solely on training data.
|
| 2024-08-19 |
ELDER: Enhancing Lifelong Model Editing with Mixture-of-LoRA |
|
|
• ELDER proposes a novel lifelong model editing method using a Mixture-of-LoRA structure to establish continuous associations between data and adapters, enhancing robustness against rephrased inputs.
• The framework integrates a router network with a guided loss function to align LoRA allocations with edit knowledge and utilizes a deferral mechanism to preserve the model's general capabilities.
• Extensive experiments on GPT-2 XL and LLaMA2-7B demonstrate that ELDER outperforms existing baselines in reliability, generalization, and scalability while maintaining performance on downstream tasks.
|
| 2024-08-16 |
MemLong: Memory-Augmented Retrieval for Long Text Modeling |
|
|
• Proposed MemLong, a memory-augmented retrieval approach for long-context language modeling by fetching historical chunks externally.
• Combined a retrieval/memory module with a partially trainable decoder-only LM, plus controllable retrieval attention over retrieved chunks.
• Evaluated on long-context language modeling benchmarks to show improved generation quality and longer effective context.
|
| 2024-08-11 |
Towards Safer Large Language Models through Machine Unlearning |
|
|
• This paper introduces the Selective Knowledge Unlearning (SKU) framework, aimed at improving the safety of large language models (LLMs).
• The SKU framework consists of two main stages: harmful knowledge acquisition, followed by knowledge negation, which focuses on removing undesirable knowledge without degrading model utility under benign prompts.
• SKU successfully reduces harmful outputs while preserving response quality, and demonstrates a strong balance between unlearning effectiveness and model utility across multiple LLM architectures, such as OPT and LLaMA2.
|
| 2024-08-06 |
RULER: What’s the Real Context Size of Your Long-Context Language Models? |
|
|
• RULER is designed for the comprehensive evaluation of long-context language models (LMs) across a wide range of tasks.
• It extends the traditional Needle-in-a-Haystack (NIAH) test by incorporating tasks such as multi-hop tracking and aggregation, enabling a more thorough assessment of models’ understanding under long-context settings.
• RULER demonstrates strong performance in multi-hop reasoning and information retrieval tasks.
|
| 2024-07-22 |
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts |
|
|
• ReadAgent is a reading comprehension system designed to improve the performance of large language models (LLMs) when processing long-form text.
• Through three steps—episodic pagination, memory summarization, and interactive lookup—ReadAgent significantly extends the effective context length by up to 20×.
• ReadAgent outperforms traditional approaches on long-document reading comprehension benchmarks such as QuALITY, NarrativeQA, and QMSum.
|
| 2024-06-30 |
Towards Efficient and Effective Unlearning of Large Language Models for Recommendation |
|
 |
• Introduces E2URec, a recommendation data unlearning method specifically designed for LLM-based recommender systems (LLMRec).
• E2URec significantly improves unlearning efficiency while preserving recommendation performance by updating only Low-Rank Adaptation (LoRA) parameters.
• Experimental results show that E2URec outperforms existing baseline methods on real-world datasets.
|
| 2024-05-30 |
Knowledge Graph Tuning: Real-time Large Language Model Personalization based on Human Feedback |
|
|
• Proposes Knowledge Graph Tuning (KGT), a novel approach that personalizes large language models (LLMs) by optimizing external knowledge graphs based on user feedback, without modifying model parameters.
• KGT extracts personalized factual knowledge triples from user interactions and employs a heuristic optimization algorithm, avoiding the high computational costs and low interpretability of back-propagation methods.
• Experiments with models like Llama2 and Llama3 demonstrate that KGT significantly enhances personalization performance while reducing latency by up to 84% and GPU memory costs by up to 77%.
|
| 2024-05-26 |
MemoryLLM:Towards self-Update Large Language Models |
|
|
• MEMORYLLM is a self-updating large language model designed to effectively integrate new knowledge while maintaining long-term information retention.
• By embedding a fixed-size memory pool in the latent space of the transformer, MEMORYLLM achieves a seamless combination of model self-updating and knowledge preservation.
• Key design features include memory tokens that store compressed knowledge, an intelligent self-updating mechanism, and comprehensive evaluations of knowledge integration, retention capability, and robustness.
|
| 2024-05-23 |
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models |
|
|
• HippoRAG is a novel retrieval framework inspired by the hippocampal indexing theory of human long-term memory, designed to enable deeper and more efficient knowledge integration for LLMs.
• By orchestrating LLMs, knowledge graphs, and Personalized PageRank (PPR) to mimic the neocortex and hippocampus, it enables effective single-step multi-hop retrieval.
• The method outperforms state-of-the-art retrieval-augmented generation (RAG) methods on multi-hop QA tasks by up to 20% and is significantly faster and cheaper than iterative retrieval approaches.
|
| 2024-05-23 |
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models |
|
|
• Identifies an "impossible triangle" in lifelong model editing—reliability, generalization, and locality cannot be simultaneously achieved—attributing this to the gap between long-term and working memory mechanisms.
• Proposes WISE, a dual parametric memory framework that utilizes a side memory for edits and a router to bridge it with the pretrained main memory, employing knowledge sharding and merging to handle continuous updates.
• Extensive experiments show that WISE outperforms existing methods in question answering, hallucination correction, and out-of-distribution generalization settings across multiple LLM architectures.
|
| 2024-04-26 |
Enhancing Large Language Model with Self-Controlled Memory Framework |
|
|
• Proposes the Self-Controlled Memory (SCM) framework to unleash infinite-length input capacity for Large Language Models (LLMs) without requiring modification or fine-tuning.
• The framework comprises an LLM-based agent, a memory stream for storing historical information, and a memory controller that dynamically manages "Activation Memory" (long-term) and "Flash Memory" (short-term).
• The authors also contribute a dataset covering long-term dialogues, book summarization, and meeting summarization, demonstrating that SCM achieves superior retrieval recall and response generation compared to baselines.
|
| 2024-04-24 |
From Local to Global: A GraphRAG Approach to Query-Focused Summarization |
|
|
• Introduces GraphRAG, a graph-based retrieval-augmented generation approach designed to address the limitations of conventional vector RAG in answering global questions about an entire text corpus.
• The method constructs an entity knowledge graph from source documents, partitions it into hierarchical communities using the Leiden algorithm, and pre-generates summaries to facilitate global sensemaking.
• By utilizing a map-reduce mechanism over community summaries, GraphRAG significantly outperforms baseline RAG systems in comprehensiveness and diversity for large-scale datasets.
|
| 2024-04-15 |
Memory Sharing for Large Language Model based Agents |
|
|
• Introduces the Memory Sharing (MS) framework, which enables multiple LLM-based agents to share Prompt-Answer (PA) pairs as memories in a dynamic, real-time pool.
• The framework employs a dual-purpose mechanism where newly generated high-quality memories are used to enhance In-Context Learning for agents and simultaneously train the retriever to improve future retrieval relevance.
• Experimental results across domains like Literary Creation and Logic Problem-solving demonstrate that the MS framework effectively evolves individual intelligence into collective intelligence, significantly improving performance on open-ended questions without explicit fine-tuning.
|
| 2024-04-13 |
LLM In-Context Recall is Prompt Dependen |
|
 |
• Investigates the information recall capabilities of large language models (LLMs), with particular emphasis on their dependence on prompt content and formatting.
• Using the Needle-in-a-Haystack (NIAH) evaluation, the study finds that recall performance is strongly influenced by training data bias, as well as the content and structure of prompts.
• The results show that architectural improvements, training strategy adjustments, and fine-tuning can all effectively enhance recall performance.
|
| 2024-04-07 |
Online Adaptation of Language Models with a Memory of Amortized Contexts |
|
|
• Introduces Memory of Amortized Contexts (MAC), an efficient online adaptation framework for large language models (LLMs) designed to address catastrophic forgetting and high computational costs in keeping models up-to-date.
• MAC utilizes a meta-learned amortization network to compress new documents into compact parameter-efficient finetuning (PEFT) modulations stored in a memory bank, using an aggregation network to retrieve and combine relevant knowledge for specific queries.
• Experimental results on StreamingQA and SQuAD-Seq demonstrate that MAC significantly outperforms existing online finetuning methods in both adaptation performance and knowledge retention, while offering superior time and memory efficiency.
|
| 2024-03-24 |
MemoryBank: Enhancing Large Language Models with Long-Term Memory |
|
|
• MemoryBank is a long-term memory mechanism designed for large language models (LLMs) to address memory limitations in continuous interactions.
• By enabling models to effectively recall, update, and adapt user memories, MemoryBank enhances contextual understanding and user experience.
• Experimental results and analyses demonstrate MemoryBank’s effectiveness in improving emotional support and personalized interactions.
|
| 2024-02-16 |
Large Language Model Unlearning |
|
|
• Explores methods for implementing “forgetting” or “unlearning” in large language models (LLMs) to eliminate undesired or misaligned behaviors.
• By applying a gradient ascent (GA) strategy and introducing a random-output loss, the study demonstrates that unlearning can effectively prevent models from generating harmful responses.
• Experimental results show that the GA and GA + Mismatch approaches perform particularly well in reducing content leakage rates.
|
| 2024-02-06 |
Compressed context memory for online language model interaction |
|
|
• Proposes a compressed contextual memory approach to improve the memory efficiency and computational performance of online language models when handling extended contexts.
• By leveraging conditional LoRA integration and parallel computation, the method significantly reduces memory requirements and enables support for effectively unlimited context lengths, surpassing traditional sliding-window strategies.
• Experimental results demonstrate that, across applications such as multi-task learning and dialogue generation, the approach reduces memory usage by up to 5× while effectively preserving generation quality and accuracy.
|
| 2023-12-10 |
Unlearn What You Want to Forget: Efficient Unlearning for LLMs |
|
|
• Introduces the Efficient Unlearning (EUL) framework, designed to address the challenges of handling user privacy data in large language models (LLMs).
• As LLMs are widely deployed, models may inadvertently memorize sensitive information during pretraining, raising significant privacy concerns.
• EUL enables the effective removal of specific sensitive data from LLMs without full retraining, while preserving overall predictive performance.
|
| 2023-11-30 |
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models |
|
|
• JARVIS-1 is an open-world multi-task agent for Minecraft that generates plans and executes tasks using a multimodal language model (MLM). It can perceive visual information and human instructions, and, by combining multimodal memory, it leverages past experiences to improve future task performance.
• JARVIS-1 integrates MLM and multimodal memory to generate action plans using visual observations and instructions, which are executed by goal-conditioned controllers. It features a self-improvement mechanism, where it autonomously generates tasks through self-instruction, explores the environment, and accumulates experiences to enhance decision-making abilities.
• JARVIS-1 excels in over 200 Minecraft tasks, especially in long-term tasks (such as obtaining a diamond pickaxe), outperforming current state-of-the-art models by five times in success rate. As the game progresses, its performance improves through continuous learning and experience accumulation.
|
| 2023-11-15 |
Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory |
|
 |
• Introduces a novel memory mechanism, Think-in-Memory (TiM), designed to enhance the performance of large language models (LLMs) in long-term human–AI interactions.
• TiM incorporates an efficient retrieval mechanism based on locality-sensitive hashing, enabling effective memory storage and management over extended interactions.
• Experimental results show that TiM significantly improves response accuracy and coherence in multi-turn dialogues.
|
| 2023-10-16 |
Character-LLM: A Trainable Agent for Role-Playing |
|
|
• Introduces Character-LLM, a trainable agent framework that teaches LLMs to act as specific characters (e.g., Beethoven) by learning from reconstructed experiences rather than relying solely on prompts.
• Proposes an Experience Upload process involving profile collection, scene extraction, and experience completion to generate high-quality, character-specific training data.
• Implements Protective Experiences to mitigate hallucinations, enabling agents to effectively "forget" or refuse knowledge inconsistent with their character's era or identity.
|
| 2023-09-22 |
Augmenting Language Models with Long-Term Memory |
|
|
• Introduces a new framework, LONGMEM, designed to enhance the ability of large language models (LLMs) to process long-form text.
• LONGMEM employs a decoupled network architecture that combines a frozen LLM memory encoder with an adaptive residual side network, enabling efficient caching and updating of long-term contextual information.
• By incorporating specialized memory-augmentation layers, a token-based memory retrieval module, and a joint attention mechanism, LONGMEM improves memory retrieval and context utilization, and demonstrates effectiveness across a variety of tasks.
|
| 2023-08-16 |
MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation |
|
|
• Proposes MemoChat, an instruction tuning pipeline designed to enable Large Language Models (LLMs) to employ self-composed memos for maintaining consistency in long-range open-domain conversations.
• The approach utilizes a "memorization-retrieval-response" cycle, teaching LLMs to restructure dialogue history into memos and retrieve relevant evidence for answering current queries.
• Experiments show that MemoChat outperforms strong baselines on a newly curated, expert-annotated consistency benchmark (MT-Bench+), verifying the efficacy of the memo-equipped inner thinking process.
|
| 2023-05-23 |
RET-LLM: Towards a General Read-Write Memory for Large Language Models |
|
|
• RET-LLM is a framework that equips large language models (LLMs) with a dedicated read-write memory unit, enabling them to explicitly extract, store, and recall knowledge from text.
• Inspired by Davidsonian semantics, the system extracts knowledge in the form of triplets (concept, relationship, concept) and uses a controller to manage interactions between the LLM and the memory module using a text-based API.
• The memory unit is designed to be scalable, updatable, and interpretable, effectively handling temporal-based question answering tasks where static models often fail.
|
| 2023-05-22 |
RECURRENTGPT: Interactive Generation of (Arbitrarily) Long Text |
|
|
• Introduces RECURRENTGPT, a language-based simulacrum of the LSTM recurrence mechanism built upon LLMs to generate arbitrarily long texts without forgetting.
• Utilizes a dual-memory system: a short-term memory updated in the prompt and a long-term memory stored on hard drives retrieved via semantic search.
• Enables interpretable and interactive text generation ("AI as Contents"), allowing human users to observe and edit natural language memories and plans during the generation process.
|
| 2023-05-08 |
Prompted LLMs as Chatbot Modules for Long Open-domain Conversation |
|
|
• Proposes MPC (Modular Prompted Chatbot), a novel approach using pre-trained LLMs as individual modules (clarifier, memory processor, utterance generator, summarizer) to create high-quality conversational agents without fine-tuning.
• Utilizes techniques like few-shot prompting, chain-of-thought (CoT), and external memory (using DPR) to achieve long-term consistency and flexibility in open-domain dialogue.
• Human evaluation results demonstrate that MPC is on par with or superior to fine-tuned models like Blenderbot3 in terms of sensibleness, consistency, and engagingness, particularly in maintaining long-term persona consistency.
|