Research

Sleep-time Compute

April 21, 2025

TL;DR Sleep-time compute is a new way to scale AI capabilities: letting models "think" during downtime. Instead of sitting idle between tasks, AI agents can now use their "sleep" time to process information and form new connections by rewriting their memory state. Read our research paper here, and learn how to make sleep-time agents in Letta on our docs.

‍

From test-time scaling to sleep-time scaling

Over the past year we’ve witnessed the rise of the “reasoning models”: models that "think" before they answer. Recent models such as OpenAI's o1, DeepSeek's R1, and Anthropic's Claude 3.7 don’t respond instantly, instead, they’ve been trained to output large reasoning traces before returning a final response. In certain application domains such as math and coding, allowing such a model to think for longer (anywhere from a few seconds to a few minutes) has been shown to significantly improve the “intelligence” of the model.

This approach, known as "test-time scaling", has proven incredibly effective at pushing the frontier of LLM-driven AI systems - the more we scale the reasoning at test-time, the better the results get. But what if we're only scratching the surface of what these systems could achieve? What if we've been severely underutilizing the potential of these powerful AI systems by only activating their reasoning capabilities when a user is actively waiting?

We believe there's a fundamental paradigm shift waiting to be unlocked: AI systems that don't just think reactively when prompted, but proactively deepen their understanding during what we call "sleep time" - the vast periods when they're not directly engaged with users.

This is the fundamental insight behind sleep-time compute, a new paradigm we're introducing in our latest research paper. Sleep-time compute represents an exciting new scaling direction for stateful AI systems: by activating deep thinking during their vast idle periods, we can expand what these systems understand and how they reason - pushing AI capabilities beyond what's possible when computation is confined only to active interactions.

Making machines think while they sleep

The key idea behind sleep-time compute is that our agents should be running even while they “sleep”, using their downtime to reorganize information and reason through the information they have available in advance. Many agents today exist in environments with persisted, re-used context: for example, a coding agent can study a codebase in advance of being asked to program in it, or a conversational agent can reflect on previous interaction from the user to consolidate information prior to interacting with the user again.

Reasoning during sleep time transforms “raw context” into “learned context”, which can then be used later during test time. Compared to agents which only have access to the raw context, agents that have the opportunity to compute a learned context don't need to perform as much additional reasoning later on, since they have already thought about the material in advance.

The crucial role of memory and state

Since the beginning of our work on MemGPT, our vision has been agents which are able to actually learn and evolve over time. Key to this is memory and state: despite their impressive capabilities, today's language models are fundamentally stateless, as they have no persistent memory of their experiences or learnings.

This limitation creates a critical bottleneck in how we can scale compute for AI systems. Traditional stateless models can only benefit from increased computation during active use (test time), because they reset between interactions. For stateless agents, any insights or connections made during downtime would simply vanish when the next interaction begins.

Sleep-time compute fundamentally requires stateful AI agents: systems that maintain memory (or context) over time. Persistent context forms the bridge that allows insights gained during sleep time to improve future capability.

‍

Better cost and performance

Our research paper demonstrates that sleep-time compute creates a Pareto improvement in model performance. When tested on challenging math benchmarks like AIME and GSM, sleep-time agents are able to shift computational load from high-latency user interactions to periods when the system would otherwise sit idle, all without sacrificing performance quality. Additionally, we expect sleep-time compute to have significant impact beyond math and coding domains, benefiting conversational agents (through reflection on memories in the user's absence) and assistants (by enabling asynchronous ingestion of large data sources).

‍

MemGPT 2.0: Sleep-time agents in Letta

We are releasing a new agent type in Letta, sleep-time enabled agents, which are based on the technique described in the sleep-time compute research paper.

When you create agents with this type, Letta actually creates two agents under the hood: a primary agent and a sleep-time agent. The primary agent is similar to Letta agents today: it has the ability to send messages to the user, can call custom user tools, and can search its external memory (recall memory for conversation history, and archival memory for specifically saved memories). However the primary agent is not provided with tools to edit its core memory, which is the memory stored in-context composed of memory blocks. These tools are attached to the sleep-time agent, which has the ability to manage *both* the in-context memory of the primary agent as well as its own in-context memory.

This approach solves a number of problems with the original MemGPT agent design. In MemGPT, memory management, conversation, and other tasks are all bundled into a single agent, which means the agent may be slower (if it has to call memory operations during conversation) and potentially less reliable (since it calls both memory management tools and standard tools). Offloading memory to a sleep-time agent allows memory management to happen asynchronously. Memory formation in MemGPT is incremental, so memories may become messy and disorganized over time. Sleep-time agents on the other hand can continuously improve their learned context to generate clean, concise, and detailed memories.

‍

Configuring sleep-time agents

Since Letta is model agnostic, the primary and sleeptime agents can be configured independently with different underlying models. We have found that it is useful to make the sleep-time agents stronger models since they are less latency constrained - for example, the conversational primary agent can use a fast model like gpt-4o-mini, while the sleep-time agent can use a larger and slower model like gpt-4.1 or Sonnet 3.7.

Sleeptime agents can also be configured to run at different frequencies. The higher the frequency setting, the more tokens your agent will use: but the more time the agent will have to revise its learned context.

Additionally, the sleep-time compute paradigm can be used to improve document analysis - you can upload a data source to your agent, and a sleep-time agent can parse it in the background, reading over the document and modifying the primary agent's memory to include all of the important findings. Importantly, the sleep-time agent modifies the memory in an “anytime” fashion - so the primary agent can read from this memory whenever, without having to wait for the sleep-time agent to finish its reasoning.

We anticipate that in the future, most agents in Letta will be sleep-time-enabled for better memory, personalization, and improved latency.

Conclusion

We've released our sleep-time agents as part of Letta 0.7.0. Try it yourself through our sleep-time agent template in the ADE via Letta Desktop or Letta Cloud.

We're excited to see how the community builds on these ideas to create AI systems that truly never stop learning.

Interested in working on research at Letta? Apply to join our team.

Back

Twitter/X

Company

Company announcements, partnerships

Jul 7, 2025

Agent Memory: How to Build Agents that Learn and Remember

Traditional LLMs operate in a stateless paradigm—each interaction exists in isolation, with no knowledge carried forward from previous conversations. Agent memory solves this problem.

Jul 3, 2025

Anatomy of a Context Window: A Guide to Context Engineering

As AI agents become more sophisticated, understanding how to design and manage their context windows (via context engineering) has become crucial for developers.

May 14, 2025

Memory Blocks: The Key to Agentic Context Management

Memory blocks offer an elegant abstraction for context window management. By structuring the context into discrete, functional units, we can give LLM agents more consistent, usable memory.

Feb 13, 2025

RAG is not Agent Memory

Although RAG provides a way to connect LLMs and agents to more data than what can fit into context, traditional RAG is insufficient for building agent memory.

Feb 6, 2025

Stateful Agents: The Missing Link in LLM Intelligence

Introducing “stateful agents”: AI systems that maintain persistent memory and actually learn during deployment, not just during training.

Nov 14, 2024

The AI agents stack

Understanding the AI agents stack landscape.

Nov 7, 2024

New course on Letta with DeepLearning.AI

DeepLearning.AI has released a new course on agent memory in collaboration with Letta.

Sep 23, 2024

Announcing Letta

We are excited to publicly announce Letta.

Sep 23, 2024

MemGPT is now part of Letta

The MemGPT open source project is now part of Letta.

Product

Release notes, feature announcements

Jul 24, 2025

Introducing Letta Filesystem

Today we're announcing Letta Filesystem, which provides an interface for agents to organize and reference content from documents like PDFs, transcripts, documentation, and more.

Apr 17, 2025

Announcing Letta Client SDKs for Python and TypeScript

We've releasing new client SDKs (support for TypeScript and Python) and upgraded developer documentation

Apr 2, 2025

Agent File

Introducing Agent File (.af): An open file format for serializing stateful agents with persistent memory and behavior.

Jan 15, 2025

Introducing the Agent Development Environment

Introducing the Letta Agent Development Environment (ADE): Agents as Context + Tools

Dec 13, 2024

Letta v0.6.4 release

Letta v0.6.4 adds Python 3.13 support and an official TypeScript SDK.

Nov 6, 2024

Letta v0.5.2 release

Letta v0.5.2 adds tool rules, which allows you to constrain the behavior of your Letta agents similar to graphs.

Oct 23, 2024

Letta v0.5.1 release

Letta v0.5.1 adds support for auto-loading entire external tool libraries into your Letta server.

Oct 14, 2024

Letta v0.5 release

Letta v0.5 adds dynamic model (LLM) listings across multiple providers.

Oct 3, 2024

Letta v0.4.1 release

Letta v0.4.1 adds support for Composio, LangChain, and CrewAI tools.

Research

Sleep-time compute, anatomy of a context window

Aug 27, 2025

Introducing Recovery-Bench: Evaluating LLMs' Ability to Recover from Mistakes

We're excited to announce Recovery-Bench, a benchmark and evaluation method for measuring how well agents can recover from errors and corrupted states.

Aug 12, 2025

Benchmarking AI Agent Memory: Is a Filesystem All You Need?

Letta Filesystem scores 74.0% of the LoCoMo benchmark by simply storing conversational histories in a file, beating out specialized memory tool libraries.

Aug 5, 2025

Building the #1 open source terminal-use agent using Letta

We built the #1 open-source agent for terminal use, achieving 42.5% overall score on Terminal-Bench ranking 4th overall and 2nd among agents using Claude 4 Sonnet.

May 29, 2025

Letta Leaderboard: Benchmarking LLMs on Agentic Memory

We're excited to announce the Letta Leaderboard, a comprehensive benchmark suite that evaluates how effectively LLMs manage agentic memory.

Sleep-time Compute

From test-time scaling to sleep-time scaling

Making machines think while they sleep

The crucial role of memory and state

Better cost and performance

MemGPT 2.0: Sleep-time agents in Letta

Configuring sleep-time agents

Conclusion

Company

Product

Research

Product

DEVELOPERS

Company

Newsletter