Company

RAG is not Agent Memory

February 13, 2025

RAG (“retrieval augmented generation”) is currently the go-to solution whenever we need to connect LLMs to external sources of information. RAG is even used as a form of memory for conversational agents, by retrieving old messages that are semantically similar to the most recent message.

Although RAG provides a way to connect LLMs and agents to more data than what can fit into context, it doesn’t come without its limitations and risks. RAG often places irrelevant data into the context window, resulting in context pollution and degraded performance (especially for newer reasoning models). Although RAG is an important tool in the agents stack, it is far from a complete solution.

‍

Why connect LLMs to external data?

An LLM’s knowledge about the world is frozen. It doesn’t know the current weather in San Francisco or anything about your users unless you tell it. Even if you’re confident your LLM has the information users need in its weights, you may still want to connect it to external data to minimize the chances of hallucination.

For example, if you want your AI agent to reliably answer factual queries like, “How many calories are generally in an apple?” it should reference an external source like WebMD or a nutritional facts database rather than relying on the underlying LLM. Knowing what sources are in a model’s context window also allows you to double-check its work.

‍

Connecting data (or memories) to agents with RAG

When people think about linking their LLMs to external data, they default to retrieval augmented generation (RAG). RAG rose in popularity because of its simplicity: you embed a document in your LLM workflow, use an algorithm to find the top-K-related snippets of that document, and then deposit those snippets into the context window. This can also be used to provide a rudimentary form of memory, by searching over old messages. But a simple process doesn’t always yield great results.

‍

Limitations of RAG-based memory and context

1. RAG is single step

LLMs get “one shot” at retrieving the most relevant data and generating a response. While that can work, it often doesn’t. To illustrate this point, imagine, for a moment, that you are a teacher. You task your students with writing a book report on a novel they’ve never read before. The report needs to be two pages and three sections.

A RAG approach to helping students write this book report would look like this:

Step 1: Shred the book into pieces of paper, each a few sentences long.

Step 2: Collect the top 10 most relevant shreds according to your (very basic) report guidelines: two pages and three sections.

Step 3: Ask students to write a report based on the top 10 shreds.

This is not a good way to write a book report. The students have no context for those top 10 shreds, so it’s hard to form a summary of the book, let alone a central thesis. In desperation, they may even resort to making things up.

When working with LLM-driven agents, a scenario like this could yield similarly bad or even worse results. Typically, RAG sorts the shreds from Step 2 by cosine similarity, a methodology rooted in correlation and notoriously bad at finding relevant snippets. Putting the top 10 only possibly applicable excerpts into a context window will likely lead to irrelevant results.

2. RAG is purely reactive

If a user says, “Today is my birthday,” a RAG agent will search for the word “birthday” in the vector database. The problem with searching for the word “birthday” is that the agent relies on RAG to mimic having memory by retrieving potentially relevant previous messages.

But RAG isn't going to retrieve personalization information that isn’t semantically similar (i.e., according to an embedding model) to the searched word.

So, even if a user mentioned their favorite color or movie in the past, the model sees “movie,” “color,” and “birthday” as completely unrelated words. It won’t combine them into a response like, “Are you going to make it Star Wars-themed like your last party?” or “You should get blue cake since that's your favorite color!” as a best friend — who has a functioning memory — would.

In fact, you’ll never get that level of personalization by embedding search (how RAG is usually implemented) because that personal information will never be retrieved.

‍

If not RAG, then what?

Let’s play this out using the book report example. Say the book is so long that you can’t dump it into an LLM context window. If you can’t use traditional RAG, what tools could you make to help students write their reports?

Ideally, you’d build something that helps them navigate to any page so they can read the book page-by-page, taking notes over time. And you’d create a text search tool, something similar to Google. That way, if the students identified key themes in the book, they could search for supporting examples.

Together, these tools would help students digest the content in the book, think through the arguments of their report, and find parts of the book that prove their points, ultimately developing a much more thorough and accurate report.

‍

Agentic RAG

The approach described above, multi-step reasoning with tools, is “agentic RAG,” and it’s the underlying foundation of Letta. Letta’s design takes all the work that’s gone into developing the search and doc retrieval tools we use today and embeds those tools into LLMs, preparing the context window by summarizing and organizing its “memory.”

With agentic RAG, LLMs can paginate through multiple pages of results, potentially even traversing an entire dataset, while also maintaining state. If you used agentic RAG to build a book report, the process would look like this:

Step 1: Read the first five pages.

Step 2: Write a summary of those five pages.

Step 3: Read the next five pages.

Step 4: Update the summary based on new information.

…

Agentic RAG’s iterative methodology updates results each time it retrieves and reviews more information, generating a more holistic and accurate response than if it retrieved information and generated responses only once (as in traditional RAG).

Agentic RAG also solves the reactivity problem.

With agentic RAG, an AI agent isn’t doing a top-K match and dump. It’s already distilled important data it’s received in the past (a customer’s favorite movie or favorite color) and organized it in such a way that the model can proactively relate it to the user’s prompt and curate a response: “You should have a Star Wars-themed party!”, or “You should get blue decorations!”

‍

Conclusion

For companies that want to build robust AI agents, traditional RAG is insufficient. The key to developing AI agents that are precise, interpretable, proactive, and deeply aligned with an organization’s unique goals and data environments is to give your LLMs memory.

If you’re curious about how to do that, check out the Letta quickstart guide, or enroll in our Deep Learning AI course on agent memory. Or, if you’re ready to build your own sophisticated agents, request early access to the Letta Cloud platform.

Back

Twitter/X

Company

Company announcements, partnerships

Jul 7, 2025

Agent Memory: How to Build Agents that Learn and Remember

Traditional LLMs operate in a stateless paradigm—each interaction exists in isolation, with no knowledge carried forward from previous conversations. Agent memory solves this problem.

Jul 3, 2025

Anatomy of a Context Window: A Guide to Context Engineering

As AI agents become more sophisticated, understanding how to design and manage their context windows (via context engineering) has become crucial for developers.

May 14, 2025

Memory Blocks: The Key to Agentic Context Management

Memory blocks offer an elegant abstraction for context window management. By structuring the context into discrete, functional units, we can give LLM agents more consistent, usable memory.

Feb 6, 2025

Stateful Agents: The Missing Link in LLM Intelligence

Introducing “stateful agents”: AI systems that maintain persistent memory and actually learn during deployment, not just during training.

Nov 14, 2024

The AI agents stack

Understanding the AI agents stack landscape.

Nov 7, 2024

New course on Letta with DeepLearning.AI

DeepLearning.AI has released a new course on agent memory in collaboration with Letta.

Sep 23, 2024

Announcing Letta

We are excited to publicly announce Letta.

Sep 23, 2024

MemGPT is now part of Letta

The MemGPT open source project is now part of Letta.

Product

Release notes, feature announcements

Oct 23, 2025

Letta Evals: Evaluating Agents that Learn

Introducing Letta Evals: an open-source evaluation framework for systematically testing stateful agents.

Oct 14, 2025

Rearchitecting Letta’s Agent Loop: Lessons from ReAct, MemGPT, & Claude Code

Introducing Letta's new agent architecture, optimized for frontier reasoning models.

Sep 30, 2025

Introducing Claude Sonnet 4.5 and the memory omni-tool in Letta

Letta agents can now take full advantage of Sonnet 4.5’s advanced memory tool capabilities to dynamically manage their own memory blocks.

Jul 24, 2025

Introducing Letta Filesystem

Today we're announcing Letta Filesystem, which provides an interface for agents to organize and reference content from documents like PDFs, transcripts, documentation, and more.

Apr 17, 2025

Announcing Letta Client SDKs for Python and TypeScript

We've releasing new client SDKs (support for TypeScript and Python) and upgraded developer documentation

Apr 2, 2025

Agent File

Introducing Agent File (.af): An open file format for serializing stateful agents with persistent memory and behavior.

Jan 15, 2025

Introducing the Agent Development Environment

Introducing the Letta Agent Development Environment (ADE): Agents as Context + Tools

Dec 13, 2024

Letta v0.6.4 release

Letta v0.6.4 adds Python 3.13 support and an official TypeScript SDK.

Nov 6, 2024

Letta v0.5.2 release

Letta v0.5.2 adds tool rules, which allows you to constrain the behavior of your Letta agents similar to graphs.

Oct 23, 2024

Letta v0.5.1 release

Letta v0.5.1 adds support for auto-loading entire external tool libraries into your Letta server.

Oct 14, 2024

Letta v0.5 release

Letta v0.5 adds dynamic model (LLM) listings across multiple providers.

Oct 3, 2024

Letta v0.4.1 release

Letta v0.4.1 adds support for Composio, LangChain, and CrewAI tools.

Research

Sleep-time compute, anatomy of a context window

Nov 7, 2025

Can Any Model Use Skills? Adding Skills to Context-Bench

Today we're releasing Skill Use, a new evaluation suite inside of Context-Bench that measures how well models discover and load relevant skills from a library to complete tasks.

Oct 30, 2025

Context-Bench: Benchmarking LLMs on Agentic Context Engineering

We are open-sourcing Context-Bench, which evaluates how well language models can chain file operations, trace entity relationships, and manage multi-step information retrieval in long-horizon tasks.

Aug 27, 2025

Introducing Recovery-Bench: Evaluating LLMs' Ability to Recover from Mistakes

We're excited to announce Recovery-Bench, a benchmark and evaluation method for measuring how well agents can recover from errors and corrupted states.

Aug 12, 2025

Benchmarking AI Agent Memory: Is a Filesystem All You Need?

Letta Filesystem scores 74.0% of the LoCoMo benchmark by simply storing conversational histories in a file, beating out specialized memory tool libraries.

Aug 5, 2025

Building the #1 open source terminal-use agent using Letta

We built the #1 open-source agent for terminal use, achieving 42.5% overall score on Terminal-Bench ranking 4th overall and 2nd among agents using Claude 4 Sonnet.

May 29, 2025

Letta Leaderboard: Benchmarking LLMs on Agentic Memory

We're excited to announce the Letta Leaderboard, a comprehensive benchmark suite that evaluates how effectively LLMs manage agentic memory.

Apr 21, 2025

Sleep-time Compute

Sleep-time compute is a new way to scale AI capabilities: letting models "think" during downtime. Instead of sitting idle between tasks, AI agents can now use their "sleep" time to process information and form new connections by rewriting their memory state.

RAG is not Agent Memory

Why connect LLMs to external data?

Connecting data (or memories) to agents with RAG

Limitations of RAG-based memory and context

1. RAG is single step

2. RAG is purely reactive

If not RAG, then what?

Agentic RAG

Conclusion

Company

Product

Research

Product

DEVELOPERS

Company

Newsletter