By Alex Urevick-Ackelsberg, Zivtech

Part 6: The Delegation Tax

Subagent results are permanent in the parent context. The agent's own context is disposable. Its result is not.


Part 1: gaps. Part 2: sessions that overstay. Part 3: file reads bloating context. Part 4: compaction dropping constraints. Part 5: tool output you never reference again. This one is the mirror image of Part 3, and the most ironic, because Part 3 is what caused it.

Problem #6: Subagent Results That Inflate the Parent

Part 3 taught me to delegate. Need 50 files read? Spin up a subagent. Files live in the agent's window, not yours. Agent finishes, window is discarded, parent gets a lean summary.

I internalized the lesson. I delegate aggressively: research agents, critic pipelines, parallel explorers. But there's a cost I wasn't tracking.

When a subagent finishes, its result lands in the parent session and stays there. Reprocessed on every subsequent turn. The agent's context is disposable. Its result is permanent.

I was optimizing the wrong side: keeping files out (good) while letting agent results pile up (bad). The delegation habit from Part 3 was creating a new cost that scales with how enthusiastically you delegate.

What I Was Doing

The research swarm. I launch 3-5 agents in parallel: codebase explorer, documentation reader, test coverage checker, git history reviewer. Each returns 3-8K tokens. Combined: 15-40K tokens in the parent context. I read them once to synthesize. They ride along, reprocessed every turn, for the rest of the session.

Five agents processed 129K tokens internally (discarded when done). Their 25K tokens of results persist in the parent forever. The agents' internal work was cheap. Their results are the ongoing cost.

The critic pipeline. Standard review: proposal-critic evaluates the plan, then react-critic, a11y-critic, and drupal-critic in parallel. Each returns 4-8K tokens. By the time I act, the parent carries 20-30K tokens of review output. Useful for one decision. Dead weight for the next 40 turns of implementation.

The background agent that finished. run_in_background is the right pattern for long-running research. But when the agent finishes, its result arrives in an already-heavy context. One more 5K-token result adds to a stack that's already expensive to carry.

The Math

Two components. Most people only see the first.

The invoice: what the subagent spent. A typical research agent processes ~30K input tokens, generates ~5K output. At Opus 4.6 rates ($15/MTok input, $75/MTok output): ~$0.82 per agent, ~$4.10 for five. This is what tools like openclaw/agent-cost-monitor surface.

The tax: what the parent pays to carry results. Five agents returning 25K tokens combined, reprocessed over 20 turns at cache-read rates ($1.50/MTok): 500K tokens, ~$0.75. Mix in cold-cache turns from Part 1 ($15/MTok) and that becomes $7.50. One delegation round. My sessions regularly have 2-3.

Move the sliders. The dashed line is the one-time agent spend (the invoice). The solid line is the cumulative carrying cost (the tax). Watch where they cross — the tax overtakes the invoice within a few turns.

The Annual Math

One delegation swarm per day (five agents, 5K-token results each): 25K tokens per swarm, 625K tokens reprocessed per day, ~156M per year.

At cache-read rates ($1.50/MTok): ~$234/year per developer. Blended with cold-cache turns: $500-$2,000/year. The developer who delegates the most (following Part 3 most faithfully) pays the highest tax.

Why This Surprised Me

I built the habit that creates it. Part 3's fix for context bloat is delegation, and it's still the right call. Reading 200 files in the parent is worse than the delegation tax. The trap isn't delegation itself. It's delegating without constraining what comes back.

An unconstrained agent returns everything it found. A constrained one returns what you asked for. The difference between a 2K-token summary and a 15K-token report is 7.5x in carrying cost, every turn, for the rest of the session.

What I Do Now

Tight prompts. "Report in under 200 words" or "return only file paths and line numbers" in every agent prompt. Highest-leverage fix. A 200-word summary is ~300 tokens. Unconstrained: 5,000-15,000. Carrying cost difference over 20 turns: 94K-294K tokens.

Split after swarms. After three or more agents, save and start fresh. The synthesis is in my head or in a file. Raw results are dead weight.

File-based handoff. "Write your findings to a file" in the prompt. Parent gets a file path (50 tokens), not findings (5,000). Read the file when needed.

Stagger, don't swarm. When one agent's results inform the next, run sequentially. Parallel agents are for truly independent work. Only the final synthesis needs to land in the parent.

The delegation-cost helper: make the tax visible

The companion helper installs a PostToolUse hook that tracks token sizes of Agent results landing in context, with a running total per session.

git clone https://github.com/zivtech/claude-cost-helpers
cd claude-cost-helpers/delegation-cost
./install.sh

Per-result warning at 5K tokens. Cumulative warnings escalating at 20K, 50K, and 100K. Slash command /delegation-report shows per-agent breakdown with estimated carrying cost.

Informational only. Sometimes a 10K-token critic report is exactly what you need in context. The hook makes sure that's a decision, not a default.

The ecosystem gap

No existing tool fixes the delegation tax. RTK compresses Bash output, not Agent results. token-optimizer-mcp compresses file reads, not agent returns. openclaw/agent-cost-monitor tracks the invoice (what agents spent), not the tax (what the parent pays to carry results).

The fix is behavioral: constrain what comes back, split when results accumulate, know the difference between invoice and tax.

At the fleet level

Joyus AI Internal's cache-economics module (spec 011) tracks totalInputTokens and operation-level cost via sessionId tagging. OMC's subagent tracker captures per-agent cost_usd, duration_ms, and token_usage (the invoice). On the roadmap: result_size tracking on SubagentStop events, so the platform can distinguish agents that cost $0.80 and return 2K tokens (efficient) from those that cost $0.80 and return 15K tokens (expensive to carry). The tax-to-invoice ratio is the fleet-level signal for which workflows need tighter constraints.

The Rule

Every subagent result is permanent in the parent context. Constrain what comes back ("under 200 words"), split sessions after heavy delegation, write heavy findings to files instead of returning them inline. The invoice is what the agent spent. The tax is what you pay to carry its answer. Know the difference.

The Series

Six cost drivers, six habits, six helpers:

  1. The Idle Tax — gaps kill the cache. Save and restart.
  2. The "Just One More Turn" Trap — context rot is superlinear. Split at task boundaries.
  3. The Agent That Read 200 Files — exploration belongs in subagents. The parent gets the summary.
  4. The Compact Gamble — auto-compact is lossy. Don't let it make the call.
  5. The Watching Cost — tool output is permanent. Filter, redirect, or delegate.
  6. The Delegation Tax — subagent results are permanent too. Constrain, split, or write to file.

All six helpers are in claude-cost-helpers. Install whichever ones match your workflow, or install all of them — they're independent hooks that don't conflict.

The common thread across all six: the expensive thing is never the prompt you write. It's the context you carry. Every habit in this series is about the same principle: keep the context lean, keep the cache warm, and when a session has served its purpose, let it go.


Visuals: agent results stacking at visuals/econ-mini-08-delegation-tax.html, invoice-vs-tax slider at visuals/econ-v6-delegation-carrying.html — embeddable via iframe, dark-mode aware, color-blind safe.

Code: zivtech/claude-cost-helpers on GitHub. GPL-3.0-or-later licensed.