Your Agents Keep Forgetting. Anthropic Just Gave Them a Way to Dream.
🧠

Your Agents Keep Forgetting. Anthropic Just Gave Them a Way to Dream.

Status
Published
Author
Created
May 25, 2026
Tags
LLM
GenAI Tooling
Generative AI
Text

Hook

Every senior engineer knows the feeling: you fix a bug on Friday, and on Monday a colleague files the exact same bug because nobody wrote it down. Your AI agents have been doing the same thing to you — and Anthropic just shipped a fix that shouldn't be underestimated.

Context

On May 6, 2026, at Anthropic's first developer conference — Code with Claude — the company shipped a feature called Dreaming. It's available in preview for Claude Managed Agents (supported on Opus 4.7 and Sonnet 4.6), and the name is more literal than you'd expect.
Here's the mechanic: after each agent session, structured logs are preserved — task outcomes, corrections made, tool calls that failed, time spent. Then, on a scheduled cadence, a background process runs across up to 100 past sessions. It looks for patterns: recurring failures, workflows multiple agents converged on independently, preferences that showed up across a team. Those patterns get synthesized into the agent's persistent memory. The next session, the agent starts smarter.
You can tune how autonomous this is. Dreaming can update memory automatically, or you can gate it behind a human review step before changes land — a design choice that will matter a lot for regulated industries.
The headline result: Harvey, the legal AI company, saw task completion rates increase roughly 6x after implementing Dreaming. Their agents were previously failing in the same predictable ways across sessions — forgetting filetype quirks, repeating tool-specific workarounds that had been fixed and then lost. With Dreaming, the fixes stuck.

The Insight

Here's the thing practitioners need to sit with: the problem Dreaming solves isn't primarily a memory problem. It's an institutional knowledge problem — and we've been solving the wrong version of it.
Most teams currently handle this by stuffing context into system prompts. You write out the edge cases, the tool quirks, the learned workarounds, by hand. You are the agent's institutional memory. When your team grows, or your agents multiply across workflows, that approach collapses. You can't manually curate tribal knowledge at agent scale.
Dreaming flips this. Instead of humans distilling knowledge into prompts, the agent distills knowledge from its own experience — asynchronously, between runs. This is closer to how a human junior engineer eventually becomes useful: not because their RAM got bigger, but because they accumulated pattern recognition across enough situations.
What's genuinely non-obvious here is the cross-session signal. A single agent session might fail in a way that looks like noise. A hundred sessions failing in the same way at the same step is a signal. Dreaming operates on that population-level view. That's something no prompt engineering approach can replicate, because prompts are written before the sessions happen.
Anthropic's head of Managed Agents explicitly framed this at the conference: "infrastructure, not intelligence, is now the bottleneck for production agents." Dreaming is infrastructure for agents learning at scale, not a smarter model. That framing should shift how you think about where to invest effort on agent systems.

What This Means Practically

A few things to act on or decide now:
If you're running Claude Managed Agents in production: Sign up for the Dreaming preview immediately. Even if your task completion rates look acceptable today, you are almost certainly losing performance to repeated, preventable failures that you can't see because your visibility is per-session, not aggregate. Dreaming gives you both the fix and the visibility.
If you're building your own agent infrastructure: Dreaming's architecture is worth reverse-engineering as a design pattern. The key primitives are: structured session logs with outcome metadata, an asynchronous consolidation pass, and a writable memory store the agent can read at session start. None of these are magic — but most teams aren't doing them systematically. Start logging agent outcomes the way you'd log application errors: structured, queryable, retained.
On the human-review gating: For any agent touching sensitive workflows — financial, legal, compliance — set up the review step before Dreaming updates land automatically. It's not just about safety; it's about understanding what your agents are learning. If you skip this, you'll lose visibility into why an agent's behavior drifts, and debugging it later will be expensive.
On multi-agent teams: This is where Dreaming gets interesting and underexplored. If multiple agents working in parallel converge on the same workaround independently, that's strong signal — strong enough that the workaround probably belongs in a shared tool or a system prompt update rather than just persistent memory. Dreaming can surface this; you have to close the loop.
On what this doesn't fix: Dreaming doesn't help you if your agents lack good outcome signals to begin with. Garbage in, garbage out. If your sessions don't have clear success/failure indicators, the consolidation process has nothing to learn from. Getting your task outcome logging right is a prerequisite, not an afterthought.

One Question to Leave With

If your agent is getting measurably better at a task through Dreaming — learning workarounds, developing preferences, accumulating institutional knowledge — at what point does that accumulated knowledge belong in your codebase rather than in the agent's memory store?
 
Â