Procedural Memory: How ZenSearch Agents Learn From Successful Workflows
After a successful multi-tool session, ZenSearch distills the workflow into a reusable procedure — name, trigger pattern, ordered steps, pitfalls — and future runs progressively disclose and reuse it. Here's how the Hermes-inspired self-improving loop works.
Procedural memory is a self-improving loop where AI agents distill successful multi-tool sessions into reusable procedures — a name, a trigger pattern, an ordered sequence of tool calls, known pitfalls — that future runs can look up and reuse. ZenSearch's implementation is inspired by the Hermes research on agent skill libraries: after every session with at least five tool calls and a synthesis confidence of 0.7 or higher, a lightweight LLM extracts a structured ProcedureMemory and stores it alongside other agent memories.
The payoff is real and measurable: an agent that once stumbled through a novel ten-step workflow now recognises the pattern on the next similar query and reuses the proven sequence, saving both latency and tokens. The hard part is doing this without ballooning the prompt — which is where progressive disclosure comes in.
Why Agents Need a Skill Library
Raw agents plan from scratch on every query. For repeated workflows — "summarise last week's support tickets by theme", "find which customers hit rate limits in the last 24 hours and their account tier" — that's wasteful. The agent already solved this class of problem successfully on a previous run; throwing away that solution and re-planning is like a junior engineer reinventing a runbook on every pager.
Procedural memory captures the solution. Not as a chat transcript — as a structured workflow that future runs can match against, fetch, and follow.
How Extraction Works
After a session ends successfully (≥5 tool calls, synthesis confidence ≥0.7), an async trigger fires a lightweight zen-mini call. The model receives the full tool-call sequence plus the original query and is asked to emit a ProcedureMemory with:
- name — a short imperative label ("Summarise support tickets by theme")
- trigger pattern — a keyword-and-pattern sketch that future queries can match against
- steps — the ordered tool calls (tool name, purpose, what to look for)
- pitfalls — failure modes the extractor noticed (empty result handling, pagination limits, rate-limit hotspots)
- verification criteria — how to check the output is correct
The procedure is stored in agent_memories with memory_type='procedure', scoped to the team and agent. This piggybacks on the same async trigger that handles observational memory, so there's no additional pipeline to operate.
Progressive Disclosure: The Prompt Cost Problem
Naively injecting every procedure into the system prompt would balloon token counts. ZenSearch solves this with two-stage disclosure: the system prompt lists only each procedure's name + trigger pattern — about 50 tokens each, a flat cost regardless of how many procedures exist — and a view_procedure tool lets the LLM fetch the full body on demand.
Before the agent starts reasoning, InitNode keyword-matches the incoming query against stored procedures. If there's a hit, the system prompt gets a one-line hint: "This query likely matches the `summarise-support-tickets` procedure. Consider calling `view_procedure` before planning." The LLM can then choose to fetch the full procedure, adapt it, or plan from scratch if the match looks wrong.
This keeps the system prompt cheap at steady state and only pays the tokens for the procedures that are actually relevant to the current query.
Integration With the Rest of the Stack
Procedures are stored in the same memory layer as facts and observations, so the existing memory consolidation job (Jaccard similarity ≥ 0.85) dedupes them. A procedure can reference an observational-memory snapshot for context. Verification criteria from a procedure can feed into the agent's AcceptanceCriteria checks on the next run, closing the loop on quality.
The whole system is gated by a single env var (AGENT_PROCEDURAL_MEMORY_ENABLED, default on, requires observational memory also on). Teams that want to opt out entirely flip one flag.
The Bigger Idea
Most agent platforms treat every run as stateless. That's the right default for novelty, but it's the wrong default for recurring work — which is most enterprise workflows. Procedural memory is the layer that lets agents get better at the queries their team keeps asking, rather than burning through the same tokens to rediscover the same solution.