Skip to main content
All posts
Engineering7 min read

How ZenSearch Prevents Runaway AI Agent Costs

AI agents can burn through token budgets when a plan goes wrong. ZenSearch ships a layered cost-control system — complexity-aware budget tiers, per-tool timeouts, pre-flight cost ceilings, and soft-limit pause-and-resume — that keeps a single bad query from becoming a five-figure invoice.

April 14, 2026 · ZenSearch Team

ZenSearch prevents runaway agent costs with a five-layer budget system: dynamic complexity tiers, per-sub-task wall-clock timeouts, a pre-flight dollar ceiling that rejects expensive plans before any tokens are spent, a mid-run dollar cap that gracefully pauses the run, and automatic one-shot retries for transient failures. None of these replace the others — they each catch a different failure mode, and together they make agent runs budgetable instead of hopeful.

Agents that plan and iterate are strictly more useful than single-shot chat, but the same property that makes them useful — unbounded tool use — makes them dangerous without discipline. One bad plan can loop through dozens of LLM calls; one slow connector can stall a batch of parallel tool calls and push the wall clock past any reasonable cap. These are the failure modes ZenSearch's cost-control layers are designed for.

Dynamic Budget Tiers

Before an agent runs, the ProvisionNode picks a complexity tier based on the request: factoid for one-shot lookups, procedural for documented workflows, exploratory for open-ended research, comparative for multi-source cross-reference, and automation for scheduled deep-research runs. Each tier carries its own iteration, tool-call, token, and wall-clock budgets. Tier assignment only bumps budgets upward — a deployment with a generous operator floor is never shrunk by tier classification.

For agents with an explicit plan, PlanNode further sizes the iteration budget based on step count (max(floor, 2*steps + 5)), clamped by a hard cap. A two-step plan doesn't get the same 25-iteration headroom as a ten-step comparative research task.

Per-Sub-Task Budgets

The single-biggest cause of agent timeouts is a slow connector dragging down a batch of parallel tool calls. ZenSearch caps every individual tool call's wall clock via AGENT_PER_TOOL_TIMEOUT_SECONDS, and gives each parallel batch a fair-share slice of the remaining agent wall-clock — min(remaining/N, cap) per call. A batch of five parallel searches where one connector is slow no longer starves the faster four.

Timeouts are classified as FailureClassTimeout so recovery recipes retry them once before escalating. Transient errors (rate limits, 5xx upstream errors, connection drops) get the same one-shot retry treatment and don't count against the tool-call budget.

Pre-Flight Cost Ceilings

The real backstop is a dollar ceiling. CostCheckNode runs before any LLM tokens are spent, computes a worst-case estimate from the provisioned budget, and rejects the run if it would exceed AGENT_MAX_COST_PER_RUN_USD — or if it would push the team's rolling 24-hour cost past AGENT_MAX_COST_PER_TEAM_DAY_USD. Runs that pass the pre-flight gate write a row to agent_cost_usage with the estimate at start and actuals at completion, driving the admin-dashboard "actual-vs-estimate" panel.

Mid-run, LLMNode re-checks actual cost every iteration. When it crosses the cap, the run produces a partial answer tagged truncation_reason="cost" and saves a resumable checkpoint — the agent doesn't silently burn through more budget after the trigger.

Soft-Limit Pause and Resume

When any budget exhausts (iterations, tool calls, tokens, wall clock, cost), the orchestrator saves a checkpoint keyed by softlimit:{team}:{user}:{run} with a 7-day Redis TTL. The user sees a Continue button in the chat UI that calls a resume endpoint; Orchestrator.ResumeRun rehydrates state, extends budgets by a configured delta, clears exhaustion flags, and re-enters the graph from where it left off. The iteration counter, message history, and pending work are all preserved.

For scheduled automations, the cron scheduler picks up paused runs on the next tick (via a partial index on the automations table) and auto-resumes — bounded by max_resume_attempts on the automation's config (default 3). After that cap, the run is marked abandoned and the next tick starts fresh. Pause/resume chains are linked via parent_run_id for audit-trail rendering.

Cost Observability in the UI

Every three iterations (configurable), LLMNode emits a cost_update SSE event that drives a live cost meter in the chat UI — users see real dollars ticking up against the cap as the run progresses. When the cap is hit, the Budget Paused banner appears, with the partial answer already rendered above it.

What this unlocks

Together, these five layers mean that when a product manager schedules a deep-research automation with a $10 daily cap, the platform holds that line — with graceful partial answers rather than silent runaway. That's the difference between agents as a research tool and agents as a line item on your cloud bill.