Light
Dark

Memory Blocks: The Key to Agentic Context Management

Research
May 14, 2025

Large language models (LLMs) face a fundamental constraint: they can only "see" what's in their immediate context window. This limitation creates a critical challenge when building applications that require long-term memory and coherence across interactions. The effectiveness of an LLM agent largely depends on how well its context window is managed - what information is kept, what is discarded, and how it's organized.

Memory blocks offer an elegant abstraction for context window management. By structuring the context into discrete, functional units, we can give LLM agents more consistent, usable memory. This concept, which originated in the MemGPT research paper and Letta agents framework, provides a systematic approach to organizing what information an LLM has access to during inference.

Memory Blocks: A Powerful Abstraction for Agent Memory

The idea of an agent that could manage its own memory (including its own context window) originated in the MemGPT paper. MemGPT demonstrated the idea of self-editing memory in a simple chat use-case with two in-context memory blocks: 

  1. “Human” memory block: Storing information about the user, their preferences, facts about them, and relevant context.
  2. “Persona” memory block: Containing the agent's own self-concept, personality traits, and behavioral guidelines.

These blocks were editable by the agent, and also restricted to a certain character limit (to limit context window allocation to those memories). These blocks provided a way to allocate certain parts of the context window to store important information about specific topics. 

For example, the agent can edit the “human” block as it learns more about them. And by editing the “persona” block, the agent could make sure to persist certain personality traits or “identity” features. If the agent stated a preference for vanilla ice cream, it could write it to the persona block to make sure it would be consistent in the future when asked about its favorite flavor.

Why Context Window Management Matters

Before diving deeper into memory blocks, it's important to understand why effective context window management is crucial for LLM applications:

  • Better performance: LLMs are sensitive to what's in their context window. They can't reason about information they don't "see" at inference time. By carefully managing which information is present in the context (e.g. by having it curated by another agent), we can dramatically improve the performance of agents.
  • Personalization & adaptability: For agents that serve specific users or functions, maintaining the right information in context is essential for providing tailored experiences. Without proper context management, agents either forget critical user information or must redundantly ask for it in each step of the interaction.
  • Controllability: By structuring the context window into distinct blocks with specific purposes, developers gain greater control over agent behavior. This structured approach allows for more predictable and consistent agent responses, and defining mechanisms for long-term context management in agents. 

Memory Blocks are Units of Agent Memory

Rather than treating the context window as a monolithic entity, memory blocks break the context window down into manageable, purposeful units that are also persisted. A memory block consists of:

  • A label that identifies its purpose (e.g., "human", "persona", "knowledge")
  • A value that is the string representation of the block data
  • A size limit (e.g. in characters or tokens) that dictates how much of the context window it can occupy
  • Optional descriptions that guide how the block should be used

The block value can be edited by the agent via memory tools (or custom tools), unless they are read-only: in which case, only the developer can modify them.

Editing Memory Blocks via the Letta API

Unlike ephemeral memory in many LLM frameworks, Letta's memory blocks are individually persisted in the DB, with a unique `block_id` to access them via the API and Agent Development Environment (ADE). This provides a way for developers to directly modify parts of their agent’s context window.

When Letta makes a request to the LLM, the context window is “compiled” from existing DB state, including current block values. The prompt template (i.e. formatting in the context window) can be customized with Jinja templating. 

Multi-Agent Shared Memory

One of Letta's most powerful features is the ability for multiple agents to share memory blocks:

This enables sophisticated patterns like:

  • Shared knowledge bases: Multiple agents accessing the same reference information
  • Sleep-time compute: Background agents updating the memory of primary agents
  • Collaborative memory: Teams of agents maintaining a shared understanding

Tool-Based Memory Editing

Memory blocks can be modified through a variety of tools to customize memory management. Below is an example of a custom tool to “rethink” (replace) an entire memory block’s value by specifying the new block value and the target block label:

Because memory block values are just strings, you can store more complex data structures like lists or dictionaries in blocks, as long as they can be represented into a string format.  

Practical Applications

The flexibility of Letta's memory block system enables a wide range of powerful applications:

Personalized Assistants

By maintaining detailed user information in a human memory block, assistants can remember preferences, past interactions, and important details about each user:

Sleep-Time Agents

As we demonstrated in our Sleep-time Compute research, agents can process information during idle periods and update shared memory blocks. For example, a sleep-time agent might reflect on an existing codebase (similar to Cursor’s background agents) or previous conversation history to form new memories (referred to as “learned context”) in the paper. Learned context is written to a memory block in Letta, and can be shared across multiple agents. 

Long-running Agents (e.g. Deep Research) 

For complex, long-running agents like deep research agents, the context must maintain the state of the task across multiple LLM invocations without derailment. 11x’s deep research agent benefitted from writing the “research state” into a memory block to track research progress. 

Conclusion

Building powerful, reliable agents that can learn and improve over time will require developing techniques for context management over time. Memory blocks provide a unit of abstraction for storing and managing sections of the context window. By breaking the context window into purposeful, manageable units, we enable agents that are more capable, personalized, and controllable.

Ready to experience the power of memory blocks for yourself? Get started with Letta today.

Jul 7, 2025
Agent Memory: How to Build Agents that Learn and Remember

Traditional LLMs operate in a stateless paradigm—each interaction exists in isolation, with no knowledge carried forward from previous conversations. Agent memory solves this problem.

Jul 3, 2025
Anatomy of a Context Window: A Guide to Context Engineering

As AI agents become more sophisticated, understanding how to design and manage their context windows (via context engineering) has become crucial for developers.

Feb 13, 2025
RAG is not Agent Memory

Although RAG provides a way to connect LLMs and agents to more data than what can fit into context, traditional RAG is insufficient for building agent memory.

Nov 14, 2024
The AI agents stack

Understanding the AI agents stack landscape.

Nov 7, 2024
New course on Letta with DeepLearning.AI

DeepLearning.AI has released a new course on agent memory in collaboration with Letta.

Sep 23, 2024
Announcing Letta

We are excited to publicly announce Letta.

Sep 23, 2024
MemGPT is now part of Letta

The MemGPT open source project is now part of Letta.

Jul 24, 2025
Introducing Letta Filesystem

Today we're announcing Letta Filesystem, which provides an interface for agents to organize and reference content from documents like PDFs, transcripts, documentation, and more.

Apr 17, 2025
Announcing Letta Client SDKs for Python and TypeScript

We've releasing new client SDKs (support for TypeScript and Python) and upgraded developer documentation

Apr 2, 2025
Agent File

Introducing Agent File (.af): An open file format for serializing stateful agents with persistent memory and behavior.

Jan 15, 2025
Introducing the Agent Development Environment

Introducing the Letta Agent Development Environment (ADE): Agents as Context + Tools

Dec 13, 2024
Letta v0.6.4 release

Letta v0.6.4 adds Python 3.13 support and an official TypeScript SDK.

Nov 6, 2024
Letta v0.5.2 release

Letta v0.5.2 adds tool rules, which allows you to constrain the behavior of your Letta agents similar to graphs.

Oct 23, 2024
Letta v0.5.1 release

Letta v0.5.1 adds support for auto-loading entire external tool libraries into your Letta server.

Oct 14, 2024
Letta v0.5 release

Letta v0.5 adds dynamic model (LLM) listings across multiple providers.

Oct 3, 2024
Letta v0.4.1 release

Letta v0.4.1 adds support for Composio, LangChain, and CrewAI tools.

May 29, 2025
Letta Leaderboard: Benchmarking LLMs on Agentic Memory

We're excited to announce the Letta Leaderboard, a comprehensive benchmark suite that evaluates how effectively LLMs manage agentic memory.

Apr 21, 2025
Sleep-time Compute

Sleep-time compute is a new way to scale AI capabilities: letting models "think" during downtime. Instead of sitting idle between tasks, AI agents can now use their "sleep" time to process information and form new connections by rewriting their memory state.

Feb 6, 2025
Stateful Agents: The Missing Link in LLM Intelligence

Introducing “stateful agents”: AI systems that maintain persistent memory and actually learn during deployment, not just during training.