A few weeks ago I wrote First Chat, Then Code, Now Claw, a field guide to the first big wave of open-source agent frameworks.

That post was about making the category legible.

This one is an update.

Because the center of gravity moved.

The useful question is no longer “which agent framework wins?”

It’s: what layer of the stack is getting better, and who is pushing it forward?

Better than OpenClaw at what?

Different winners, different layers: Deep Agents, Hermes Agent, OpenViking, and OpenClaw are not solving the same problem.

The short version

OpenClaw is still one of the most complete personal-agent control planes in the space.

If you want an assistant that lives across channels, supports tools, scheduling, memory, nodes, and long-running operation, OpenClaw is still one of the clearest full-stack answers.

But a few newer projects are clearly improving narrower parts of the stack:

  • LangChain Deep Agents is better than OpenClaw at packaging the agent harness for long-running, multi-step work
  • Hermes Agent is better than OpenClaw at pushing the persistent self-improving personal-agent thesis to the center of the product
  • OpenViking is better than OpenClaw at treating context architecture as a first-class infrastructure problem

That does not make them better overall.

It makes them better at specific things that matter more now than they did 6 months ago.

What changed since the last post

The old framing was simple:

  • can the model call tools?
  • can it loop?
  • can it finish a task?

That’s table stakes now.

The newer pressure points are more operational:

  • can the agent stay coherent over long tasks?
  • can it manage context without drowning in transcript sludge?
  • can it isolate subtasks instead of smearing everything into one context window?
  • can it keep durable state outside the chat log?
  • can it be observed, governed, and run safely?

That’s why the newer projects keep converging on the same ingredients:

  • planning
  • subagents
  • externalized state
  • memory
  • context management
  • sandboxing
  • scheduling
  • observability

The category is getting less theatrical and more infrastructural.

That’s good.

A cleaner way to compare “better than OpenClaw”

This conversation gets dumb fast if every project is treated like a full OpenClaw replacement.

That’s not what’s happening.

A more useful comparison is this:

Better than OpenClaw at what?

  • Harnessing long, multi-step work: Deep Agents looks stronger
  • Foregrounding self-improvement and procedural skill growth: Hermes looks stronger
  • Structuring context and retrieval: OpenViking looks stronger
  • Broad assistant surface area, channels, nodes, and control-plane utility: OpenClaw still looks stronger

That’s the real update.

Not “there’s a new winner.”

More like: different layers of the stack are finally specializing.

1) LangChain Deep Agents: the harness layer is becoming a real product category

Since this was one of the projects you specifically wanted covered, I’m putting it first.

LangChain’s Deep Agents is one of the clearest examples of the harness layer becoming a product category in its own right.

LangChain’s own framing is useful here: Deep Agents is an agent harness.

That word matters.

A harness is not just a bag of primitives. It’s an opinionated package for agents that need to do long, messy work without collapsing under their own context.

What Deep Agents actually ships

From the docs and repo, the core package is pretty clear:

  • planning and task decomposition
  • filesystem-backed context management
  • subagent spawning for isolated work
  • memory across threads/conversations
  • optional shell / sandbox execution
  • a CLI built on top of the SDK

The key idea is not “tool calling, but more.”

It’s that context management and task isolation are treated as first-class architecture.

That’s the real reason this project matters.

LangChain Docs: Deep Agents overview LangChain Docs·langchain·docs.langchain.com GitHub: langchain-ai/deepagents GitHub·langchain-ai·github.com

Why people are reacting to it

A lot of the reaction to Deep Agents converged on the same points.

People weren’t just saying “new framework dropped.” They were specifically fixating on:

  • context isolation
  • planning
  • file-backed state
  • long-horizon task handling
  • “open-source Claude Code alternative” framing

That last comparison is a little lazy, but it’s not useless. It points at what people think the market wants now: not another chat wrapper, but an environment for serious agent work.

Where it looks better than OpenClaw

OpenClaw’s strength is breadth.

Deep Agents’ strength is focus.

Compared to OpenClaw, Deep Agents looks better at:

  • presenting a clear harness identity for long-running work
  • making context isolation feel central, not auxiliary
  • packaging the agent loop for builders who don’t want the full messaging/control-plane universe
  • fitting research / coding / workflow agents more tightly

OpenClaw still wins on assistant surface area.

Deep Agents may win on clarity of the long-task harness abstraction.

That distinction matters.

My take

Deep Agents matters because it standardizes a pattern that serious builders were already rediscovering manually.

Not revolutionary science.

Still important.

The market needed a cleaner articulation of the harness layer, and this is one of the better attempts.

2) Hermes Agent: the self-improving personal-agent thesis gets more explicit

Hermes Agent from Nous Research is aiming at a bigger idea than “an agent SDK.”

It is much closer to a full personal-agent environment that wants to:

  • persist
  • remember
  • schedule
  • create or improve skills
  • run across messaging surfaces
  • keep growing more useful over time

That’s a more ambitious product claim than most harnesses make.

What Hermes actually emphasizes

From the docs and repo, Hermes is built around a few recurring ideas:

  • persistent memory across sessions
  • skills created from experience
  • scheduled automations
  • cross-channel usage (CLI + messaging)
  • subagents / parallel work
  • multiple runtime backends
  • a stronger emphasis on long-lived operation than one-shot coding flows
Hermes Agent Documentation Nous Research·Hermes Agent·hermes-agent.nousresearch.com GitHub: NousResearch/hermes-agent GitHub·NousResearch·github.com

The docs lean hard on “self-improving,” which is a phrase I usually distrust.

But here there is at least a concrete substrate under it:

  • memory persistence
  • cross-session recall
  • skill creation
  • skill reuse
  • user modeling

That’s more real than the usual “the agent learns” hand-waving.

Where it looks better than OpenClaw

OpenClaw already has memory, skills, persistence, and long-running behavior.

So the question is not whether Hermes invented those ideas.

The question is what Hermes makes feel more central.

And there I think the answer is pretty clear:

Hermes is better than OpenClaw at making the product thesis feel like this:

the real value of a personal agent is that it accumulates procedural knowledge and becomes more reusable over time

OpenClaw feels more like a control plane.

Hermes feels more like a long-lived collaborator that is supposed to build habits and reusable procedural memory.

That doesn’t make Hermes stronger overall.

But it does make Hermes stronger at articulating one of the deepest promises in the category.

What I’d watch carefully

The risk, obviously, is memory and skill drift.

Any “self-improving” system can generate a lot of self-produced garbage if the curation loop is weak.

So the big question is not whether Hermes can accumulate more stuff.

It’s whether it can accumulate better procedural memory without compounding noise.

That is the real test.

3) OpenViking: context architecture is finally being treated like infrastructure

OpenViking is the sharpest update in this whole pass.

Not because it is “the best agent.”

Because it is not trying to be the agent at all.

It is trying to fix something deeper: the way agent context is organized.

That matters because most agent stacks still handle context badly.

They scatter it across:

  • files
  • memory stores
  • vector databases
  • prompt scaffolding
  • tool output history
  • whatever retrieval glue happened to get bolted on

That works until it doesn’t.

And when it doesn’t, debugging retrieval becomes a black-box exercise in superstition.

OpenViking’s core idea

OpenViking’s pitch is basically:

  • traditional flat RAG is bad at global structure
  • long-running agents produce too much context for flat retrieval to stay sane
  • memory, resources, and skills should be organized through a filesystem-like model
  • context should be addressable, hierarchical, and observable

The specific mechanisms they emphasize are interesting:

  • unified handling of memory, resources, and skills
  • hierarchical / tiered loading (L0/L1/L2)
  • directory-style retrieval plus semantic search
  • retrieval path observability
  • automatic session compression and long-term extraction
GitHub: volcengine/OpenViking GitHub·volcengine·github.com

Where it looks better than OpenClaw

This is the cleanest “better than OpenClaw” comparison in the post.

OpenClaw already has memory and retrieval.

But OpenViking is attacking the deeper issue underneath them: context architecture itself.

That means:

  • how context is organized
  • how it is addressed
  • how it is loaded
  • how it is debugged
  • how it avoids becoming soup

If OpenViking works well in practice, then yes, it may represent a real improvement over OpenClaw in one major dimension:

the structure of context.

Not assistant breadth. Not channel support. Not overall product completeness.

But context architecture is a huge lever.

And frankly, OpenClaw is exactly the kind of system that could benefit from ideas like this.

My take

OpenViking is one of the more important projects in the current wave because it’s asking the right question.

Not:

how do we give the model more context?

But:

how do we structure context so the agent can navigate it deliberately?

That’s a better question.

4) Other viral harnesses worth tracking, but with less conviction

There are other names that keep circulating in the same GitHub/X conversations.

I’m mentioning them because they’re part of the moment, not because I’m equally confident in all of them.

OpenCode

OpenCode still matters because it helped normalize the idea that the model is not the whole product.

The harness is the product:

  • context handling
  • workflow ergonomics
  • extensibility
  • multi-model portability
  • survivability in long sessions

That’s a major category shift, even if OpenCode itself is not the subject of this post.

Goose / similar coding-agent harnesses

These keep getting grouped with Deep Agents for a reason.

They live in the same design neighborhood:

  • terminal-first flows
  • multi-step task execution
  • strong file/tool posture
  • some notion of reusable instructions or memory

The market clearly wants work environments, not just chat windows.

DeerFlow / ByteDance-adjacent agent tooling

DeerFlow keeps getting mentioned near OpenViking in “what’s going viral” conversations.

That pairing is interesting because it hints at a broader ByteDance push around:

  • long-running execution
  • context systems
  • tooling
  • memory
  • sandboxed or structured agent work

I’d watch that cluster, but I’d still put OpenViking ahead in conceptual clarity.

“Company of agents” repos

These get a lot of attention.

I’m skeptical.

They’re good at narrative. Less clear they’re good at infrastructure.

And right now infrastructure is where the real progress is.

What the stack looks like now

The category makes more sense once you stop flattening everything into “framework.”

Layer 1: personal-agent runtimes / control planes

Examples:

  • OpenClaw
  • Hermes Agent

These care about:

  • channels
  • scheduling
  • tools
  • memory
  • surfaces
  • operational boundaries

Layer 2: harnesses

Examples:

  • Deep Agents
  • coding / research harnesses in the same family

These care about:

  • planning
  • subagents
  • context isolation
  • long-task execution
  • builder ergonomics

Layer 3: context and memory infrastructure

Examples:

  • OpenViking
  • memory layers / context databases more broadly

These care about:

  • storage model
  • retrieval quality
  • observability
  • token efficiency
  • durable state organization

Layer 4: products and surfaces

Examples:

  • chat interfaces
  • coding UIs
  • web apps
  • domain-specific assistants

This layer gets the attention.

Increasingly, it is downstream of the others.

What is actually new, and what is just better branding

A lot of the current wave is not conceptually new in the absolute sense.

Planning, files, memory, tools, retrieval, and subagents were already in the water.

What’s new is that projects are becoming more honest about what’s required to survive real tasks.

The real advances

These look real to me:

  • context management becoming explicit
  • subagents for isolation becoming normal
  • externalized state replacing giant chat blobs
  • memory being treated like infrastructure
  • scheduling entering default agent design
  • sandboxing and runtime boundaries becoming part of the architecture story

The fake advances

I’d stay skeptical of:

  • vague “self-improving” claims without curation details
  • frameworks that are mostly prompt packaging with new branding
  • memory claims that reduce to “we added vector search”
  • demos with no observability or failure model

The category still has plenty of theatre in it.

But the serious projects are at least pushing in the right direction.

Practical takeaway

If you’re evaluating these tools, don’t ask “which one is best?”

Ask these instead:

1) Where does state live?

If the answer is mostly “in the transcript,” that’s a weakness.

2) How does the system manage long-task context?

If there’s no deliberate answer here, it will fail in boring ways.

3) What kind of memory is this?

Not just “does it have memory?” but:

  • what gets persisted?
  • what gets summarized?
  • what gets forgotten?
  • what is global vs project-local?
  • how do you debug recall?

4) What is the actual blast radius?

An agent with shell, web, files, and messaging is an automation system with real risk.

Treat it like one.

5) Which layer are you actually buying?

Do you need:

  • a personal assistant runtime?
  • a harness?
  • a context database?
  • a productized coding agent?

Buying the wrong layer is how teams get confused and disappointed.

Bottom line

OpenClaw is still one of the most important projects in the space because it made the personal-agent control-plane model real for a huge number of builders.

That still matters.

But the update is real.

Deep Agents is pushing the harness layer forward. Hermes is pushing the persistent personal-agent thesis forward. OpenViking is pushing context architecture forward.

Those are different bets.

They rhyme.

And they point to the same shift:

The useful agent stack is becoming less about “a clever loop around an LLM” and more about:

  • context architecture
  • durable state
  • isolation
  • memory
  • scheduling
  • operator control
  • product surfaces built on top of those layers

That’s the actual movement.

Not more prompt tricks.

Better systems.

Sources

Primary project sources

GitHub: openclaw/openclaw GitHub·openclaw·github.com LangChain Docs: Deep Agents overview LangChain Docs·langchain·docs.langchain.com GitHub: langchain-ai/deepagents GitHub·langchain-ai·github.com Hermes Agent Documentation Nous Research·Hermes Agent·hermes-agent.nousresearch.com GitHub: NousResearch/hermes-agent GitHub·NousResearch·github.com GitHub: volcengine/OpenViking GitHub·volcengine·github.com