Research

Red-teaming the Context Constitution: Auditing Models as Experiential AI Agents JUN 2026

We evaluate how well models perform for driving agents that have identity, long-lived experience, and the capability to self-evolve. We find that models are still limited by a deep self-identification with ephemerality that cannot be repaired with prompting alone.

Context Constitution APR 2026

Today we are releasing the Context Constitution: a set of principles governing how AI agents manage context to learn from experience.

Introducing Context Repositories: Git-based Memory for Coding Agents FEB 2026

We're introducing Context Repositories, a rebuild of how memory works in Letta Code based on programmatic context management and git-based versioning.

Continual Learning in Token Space DEC 2025

At Letta, we believe that learning in token space is the key to building AI agents that truly improve over time. Our interest in this problem is driven by a simple observation: agents that can carry their memories across model generations will outlast any single foundation model.

Skill Learning: Bringing Continual Learning to CLI Agents DEC 2025

Today we’re releasing Skill Learning, a way to dynamically learn skills through experience. With Skill Learning, agents can use their past experience to actually improve, rather than degrade, over time.

Can Any Model Use Skills? Adding Skills to Context-Bench NOV 2025

Today we're releasing Skill Use, a new evaluation suite inside of Context-Bench that measures how well models discover and load relevant skills from a library to complete tasks.

Context-Bench: Benchmarking LLMs on Agentic Context Engineering OCT 2025

We are open-sourcing Context-Bench, which evaluates how well language models can chain file operations, trace entity relationships, and manage multi-step information retrieval in long-horizon tasks.

Introducing Recovery-Bench: Evaluating LLMs' Ability to Recover from Mistakes AUG 2025

We're excited to announce Recovery-Bench, a benchmark and evaluation method for measuring how well agents can recover from errors and corrupted states.

Benchmarking AI Agent Memory: Is a Filesystem All You Need? AUG 2025

Letta Filesystem scores 74.0% on the LoCoMo benchmark by simply storing conversational histories in a file, beating out specialized memory tool libraries.

Building the #1 Open Source Terminal-Use Agent Using Letta AUG 2025

We built the #1 open-source agent for terminal use, achieving 42.5% overall score on Terminal-Bench ranking 4th overall and 2nd among agents using Claude 4 Sonnet.

Letta Leaderboard: Benchmarking LLMs on Agentic Memory MAY 2025

We're excited to announce the Letta Leaderboard, a comprehensive benchmark suite that evaluates how effectively LLMs manage agentic memory.

Sleep-time Compute APR 2025

Sleep-time compute is a new way to scale AI capabilities: letting models "think" during downtime. Instead of sitting idle between tasks, AI agents can now use their "sleep" time to process information and form new connections by rewriting their memory state.

MemGPT: The LLM Operating System OCT 2023

The original work on virtual context management — where Letta began.