The AI agents stack

Company

November 14, 2024

Understanding the AI agents landscape

Although we see a lot of agent stack and agent market maps, we tend to disagree with their categorizations, and find they rarely reflect what we observe actually being used by developers. The agent software ecosystem has developed significantly in the past few months with progress in memory, tool usage, secure execution, and deployment, so we decided it was time to share our own “agent stack” based on our own learnings from working on open source AI for over a year and AI research for 7+ years.

The AI agents stack in late 2024, organized into three key layers: agent hosting/serving, agent frameworks, and LLM models & storage.

From LLMs to LLM agents

In 2022 and 2023 we saw the rise of LLM frameworks and SDKs such as LangChain (released in Oct 2022) and LlamaIndex (released in Nov 2022). Simultaneously we saw the establishment of several “standard” platforms for consuming LLMs via APIs as well as self-deploying LLM inference (vLLM and Ollama).

In 2024, we’ve seen a dramatic shift in interest towards AI “agents”, and more generally, compound systems. Despite having existed as a term in AI for decades (specifically reinforcement learning), “agents” has become a loosely defined term in the post-ChatGPT era, often referring to LLMs that are tasked with outputting actions (tool calls) and that run in an autonomous setting. The combination of tools use, autonomous execution, and memory required to go from LLM → agent has necessitated a new agents stack to develop.

What makes the agent stack unique?

Agents are a significantly harder engineering challenge compared to basic LLM chatbots because they require state management (retaining the message/event history, storing long-term memories, executing multiple LLM calls in an agentic loop) and tool execution (safely executing an action output by an LLM and returning the result).

As a result, the AI agents stack looks very different from the standard LLM stack. Let’s break down today’s AI agents stack, starting from the bottom at the model serving layer:

Model serving

At the core of AI agents is the LLM. To use the LLM, the model needs to be served via an inference engine, most often run behind a paid API service.

OpenAI and Anthropic lead among the closed API-based model inference providers with private frontier models. Together.AI, Fireworks, and Groq are popular options that serve open-weights models (e.g. Llama 3) behind paid APIs. Among the local model inference providers, we most commonly see vLLM leading the pack for production-grade GPU-based serving loads. SGLang is an up-and-coming project with a similar developer audience. Among hobbyists (”AI enthusiasts”), Ollama and LM Studio are two popular options for running models on your own own computer (eg M-series Apple Macbooks).

Storage

Storage is a fundamental building block for agents which are stateful - agents are defined by persisted state like their conversation history, memories, and external data sources they use for RAG. Vector databases like Chroma, Weaviate, Pinecone, Qdrant, and Milvus are popular for storing the “external memory” of agents, allowing agents to leverage data sources and conversational histories far too large to be placed into the context window. Postgres, a traditional DB that’s been around since the 80’s, also now supports vector search via the pgvector extension. Postgres-based companies like Neon (serverless Postgres) and Supabase also offer embedding-based search and storage for agents.

Tools & libraries

One of the primary differences between standard AI chatbots and AI agents is the ability of an agent to call “tools” (or “functions”). In most cases the mechanism for this action is the LLM generating structured output (e.g. a JSON object) that specifies a function to call and arguments to provide. A common point of confusion with agent tool execution is that the tool execution is not done by the LLM provider itself - the LLM only chooses what tool to call, and what arguments to provide. An agent service that supports arbitrary tools or arbitrary arguments into tools must use sandboxes (e.g. Modal, E2B) to ensure secure execution.

Agents all call tools via a JSON schema defined by OpenAI - this means that agents and tools can actually be compatible across different frameworks. Letta agents can call LangChain, CrewAI, and Composio tools, since they are all defined by the same schema. Because of this, there is a growing ecosystem of tool providers for common tools. Composio is a popular general-purpose library for tools that also manages authorization. Browserbase is an example of a specialized tool for web browsing, and Exa provides a specialized tool for searching the web. As more agents are built, we expect the tool ecosystem to grow and also provide existing new functionalities like authentication and access control for agents.

Agent frameworks

Agent frameworks orchestrate LLM calls and manage agent state. Different frameworks will have different designs for:

Management of agent’s state: Most frameworks have introduced some notion of “serialization” of state, that allows agents to be loaded back into the same script at a later time by saving the serialized state (e.g. JSON, bytes) to a file — this includes state like the conversation history, agent memories, and stage of execution. In Letta, where all state is backed by a database (e.g. a messages table, agent state table, memory block table) there is no notion of “serialization” since agent state is always persisted. This allows for easily querying agent state (for example, looking up past messages by date). How state is represented and managed determines both how the agents application will be able to scale with longer conversational histories or larger numbers of agents, as well as how flexibly state can be accessed or modified over time.

Structure of the agent’s context window: Each time the LLM is called, the framework will “compile” the agent’s state into the context window. Different frameworks will place data into the context window (e.g. the instructions, message buffer, etc.) in different ways, which can alter performance. We recommend choosing a framework that makes the context window transparent, since this ultimately is how you can control the behavior of your agents.

Cross-agent communication (i.e. multi-agent): Llama Index has agents communicate via message queues, while CrewAI and AutoGen have explicit abstractors for multi-agent. Letta and LangGraph both support agents directly calling each other, which allows for both centralized (via a supervisor agent) and distributed communication across agents. Most frameworks now support both multi-agent and single-agent, since a well-designed single-agent system should make cross-agent collaboration easily implementable.

Approaches to memory: A fundamental limit to LLMs is their limited context window, which necessitates techniques to manage memory over time. Memory management is built-in to some frameworks, while others expect developers to manage memory themselves. CrewAI and AutoGen rely solely on RAG-based memory, while phidata and Letta use additional techniques like self-editing memory (from MemGPT) and recursive summarization. Letta agents automatically come with a set of memory management tools that allow agents to search previous messages by text or data, write memories, and edit the agent’s own context window (you can read more here).

Support for open models: Model providers actually do a lot of behind-the-scenes tricks to get LLMs to generate text in the right format (e.g. for tool calling) — for example, re-sampling the LLM outputs when they don’t generate proper tool arguments, or adding hints into the prompt (e.g. “pretty please output JSON”). Supporting open models requires the framework to handle these challenges, so some limit support to major model providers.

When building agents today, the right choice of framework depends on your application, such as whether you are building a conversational agent or workflow, whether you want to run agents in a notebook or as a service, and your requirements for open weights model support.

We expect major differentiators to arise between frameworks in their deployment workflows, where design choices for state/memory management and tool execution become more significant.

Agent hosting and agent serving

Most agent frameworks today are designed for agents that don’t exist outside of the Python script or Jupyter notebook they were written in. We believe that the future of agents is to treat agents as a service that is deployed to on-prem or cloud infrastructure, accessible via REST APIs. In the same way that OpenAI’s ChatCompletion API became the industry standard for interacting with LLM services, we expect that there will eventually be a winner for the Agents API. But there isn’t one… yet.

Deploying agents as a service is much trickier than deploying LLMs as a service, due to the issues of state management and secure tool execution. Tools and their required dependencies and environment needs to be explicitly stored in a DB, since the environment to run them needs to be re-created by the service (which is not an issue when your tools and agents are running in the same script). Applications may need to run millions of agents, each accumulating growing conversational histories. When moving from prototyping to production, inevitably agent state must go through a data normalization process, and agent interactions must be defined by REST APIs. Today, this process is usually done by developers writing their own FastAPI and database code, but we expect that this functionality will become more embedded into frameworks as agents mature.

Conclusion

The agent stack is still extremely early, and we’re excited to see how the ecosystem expands and evolves over time. If you’re interested in hosting agents or building agents with memory, you can checkout the Letta OSS project and sign up for Letta Cloud early access.

‍

Editors note: When making the AI agents stack diagram, we aimed to include companies and OSS that were representative of what software developers building a vertical agent application today (November 2024) would be most likely to use. Inevitably there are amazing companies and high-impact OSS projects that we missed - sorry if we missed you! If you’d like to be featured in a future stack diagram, please leave a comment on our LinkedIn post / Discord or shoot us an email.

Back

Twitter/X

Company

Company announcements, partnerships

Jul 7, 2025

Agent Memory: How to Build Agents that Learn and Remember

Traditional LLMs operate in a stateless paradigm—each interaction exists in isolation, with no knowledge carried forward from previous conversations. Agent memory solves this problem.

Jul 3, 2025

Anatomy of a Context Window: A Guide to Context Engineering

As AI agents become more sophisticated, understanding how to design and manage their context windows (via context engineering) has become crucial for developers.

Feb 13, 2025

RAG is not Agent Memory

Although RAG provides a way to connect LLMs and agents to more data than what can fit into context, traditional RAG is insufficient for building agent memory.

Nov 7, 2024

New course on Letta with DeepLearning.AI

DeepLearning.AI has released a new course on agent memory in collaboration with Letta.

Sep 23, 2024

Announcing Letta

We are excited to publicly announce Letta.

Sep 23, 2024

MemGPT is now part of Letta

The MemGPT open source project is now part of Letta.

Product

Release notes, feature announcements

Jul 24, 2025

Introducing Letta Filesystem

Today we're announcing Letta Filesystem, which provides an interface for agents to organize and reference content from documents like PDFs, transcripts, documentation, and more.

Apr 17, 2025

Announcing Letta Client SDKs for Python and TypeScript

We've releasing new client SDKs (support for TypeScript and Python) and upgraded developer documentation

Apr 2, 2025

Agent File

Introducing Agent File (.af): An open file format for serializing stateful agents with persistent memory and behavior.

Jan 15, 2025

Introducing the Agent Development Environment

Introducing the Letta Agent Development Environment (ADE): Agents as Context + Tools

Dec 13, 2024

Letta v0.6.4 release

Letta v0.6.4 adds Python 3.13 support and an official TypeScript SDK.

Nov 6, 2024

Letta v0.5.2 release

Letta v0.5.2 adds tool rules, which allows you to constrain the behavior of your Letta agents similar to graphs.

Oct 23, 2024

Letta v0.5.1 release

Letta v0.5.1 adds support for auto-loading entire external tool libraries into your Letta server.

Oct 14, 2024

Letta v0.5 release

Letta v0.5 adds dynamic model (LLM) listings across multiple providers.

Oct 3, 2024

Letta v0.4.1 release

Letta v0.4.1 adds support for Composio, LangChain, and CrewAI tools.

Research

Sleep-time compute, anatomy of a context window

May 29, 2025

Letta Leaderboard: Benchmarking LLMs on Agentic Memory

We're excited to announce the Letta Leaderboard, a comprehensive benchmark suite that evaluates how effectively LLMs manage agentic memory.

May 14, 2025

Memory Blocks: The Key to Agentic Context Management

Memory blocks offer an elegant abstraction for context window management. By structuring the context into discrete, functional units, we can give LLM agents more consistent, usable memory.

Apr 21, 2025

Sleep-time Compute

Sleep-time compute is a new way to scale AI capabilities: letting models "think" during downtime. Instead of sitting idle between tasks, AI agents can now use their "sleep" time to process information and form new connections by rewriting their memory state.

Feb 6, 2025

Stateful Agents: The Missing Link in LLM Intelligence

Introducing “stateful agents”: AI systems that maintain persistent memory and actually learn during deployment, not just during training.

The AI agents stack

Understanding the AI agents landscape

From LLMs to LLM agents

What makes the agent stack unique?

Model serving

Storage

Tools & libraries

Agent frameworks

Agent hosting and agent serving

Conclusion

Company

Product

Research

Product

DEVELOPERS

Company

Newsletter