By Alex Urevick-Ackelsberg, Zivtech

Part 3: The Agent That Read 200 Files

Subagents are context firewalls. Exploration belongs in a disposable window.


Part 1 was about gaps. Part 2 was about sessions that go too long. This one is about sessions that get too wide: not too many turns, but too much content pulled into one context.

I needed Claude to survey a Drupal codebase. Two hundred files, looking for a specific hook pattern. Perfect Claude task. So I ran it in my main session.

Forty minutes later the answer was good. The session was ruined.

Problem #3: Context Bloat from Exploration

When Claude reads a file in your session, the entire contents become permanent context. Every subsequent turn rereads all of it at cache-read rates.

One file? Maybe 2,000-3,000 tokens. No big deal. Fifty files for a focused survey? That's 150,000 tokens reprocessed on every turn for the rest of the session. Two hundred files and you'll hit auto-compact before you're done, creating the problems from Part 4 on top of the bloat.

At Opus 4.6 cache-read rates ($1.50/MTok), 150K tokens cost $0.225 to reprocess per turn. Twenty more turns of actual work after the survey: $4.50 of pure overhead. The survey was useful. Running it in the main session made it expensive.

What I Was Doing

The exploration pattern. "Read all the module files and tell me which ones implement hook_form_alter." Claude reads 200 files. Fifteen implement the hook. But all 200 files are in context. The 185 non-matches are still there, still reprocessed, permanently.

The research-then-implement pattern. "Understand this codebase" then "now fix this bug." The understanding phase pulls in dozens of files. The implementation phase needs three. It's carrying all of them through the entire fix.

Both feel efficient. The problem isn't the work; it's where the tokens land.

Same research, two architectures. The subagent path leaves 75× fewer tokens in your main context.

What I Do Now

Subagents. Not for speed, but as a context firewall.

Spawn a subagent with a focused brief: "read all module files and tell me which ones implement hook_form_alter." The agent reads 200 files in its window, synthesizes, and returns a 2,000-token summary. Those files never touch the parent's context. The parent's per-turn cost stays flat.

Subagents don't shrink total spend. They move where the spend lands. The subagent still pays to read 200 files, once, in an isolated context that gets thrown away. The parent session stays lean for the next 30 turns of implementation.

My rules:

More than ~5 files goes to a subagent. Five files is about 15,000 tokens, manageable. Above that, the per-turn carrying cost matters.

Exploration and implementation are separate. Understanding a codebase goes to a subagent. Implementation gets the summary.

Focused briefs, not conversation dumps. A security reviewer needs the code and the policy, not 30 turns about button colors.

The parent's cache cools while agents work. If agents run past the 5-minute TTL, the parent's next turn is a cold start — the idle tax from Part 1, triggered by delegation.

The Cache Gotcha

Connects back to Part 1: if agents take more than five minutes, the parent's cache expires. Next turn in the parent is a full re-cache (12.5x).

Two mitigations:

  1. Keep the parent warm. Do small tasks while agents work. Anything that sends a message every few minutes.
  2. Save and resume. If agents will take a long time, /save-session the parent. Start fresh when they finish.

Worst pattern: spawn three agents, tab away for 10 minutes, come back to a cold parent with 100K+ tokens. Idle tax on top of agent cost.

Why This Surprised Me

I'd been using subagents purely for speed (run three things in parallel). The cost containment angle was invisible until I noticed exploration sessions were 3-4x more expensive per turn than implementation sessions, even when the implementation sessions were longer.

The difference: exploration sessions had hundreds of files in context. Implementation sessions had a handful. Same quality of work. 75x difference in context footprint.

The subagent-isolation helper: make the bloat visible

The companion helper installs a UserPromptSubmit hook that tracks estimated context growth from file reads. When accumulated file-read tokens cross a threshold (configurable, defaults to ~50K), it warns you and suggests delegating to a subagent.

git clone https://github.com/zivtech/claude-cost-helpers
cd claude-cost-helpers/subagent-isolation
./install.sh

Informational only. Sometimes you need those files in context. The hook makes sure that's a decision, not a default.

The ecosystem fix: compressed reads

Our hook is the sensor. For automatic compression, pair it with token-optimizer-mcp, an MCP server providing smart_read and smart_grep tools that return compressed diffs instead of full file contents. 95%+ token reduction on file reads.

npx @anthropic-ai/claude-code mcp add token-optimizer-mcp -- npx -y token-optimizer-mcp

Our hook warns at 50K tokens. Their MCP makes each read dramatically cheaper. Complementary; install both.

At the fleet level

Joyus AI Internal's cache-economics module (spec 011) tracks cost per session. Agent-level cost attribution is on the roadmap via sessionId tagging. The session-level data already surfaces the impact: a parent spending $0.90/turn on file-read overhead looks very different from one that delegated and spends $0.03/turn.

The Rule

If Claude needs to read more than five files to answer your question, that question belongs in a subagent. Your main session gets the summary. The files stay in the agent's window and get thrown away when it's done.

Coming Up


Visuals: parent vs subagent context at visuals/econ-mini-03-parent-vs-subagent.html, agent spawning timeline at visuals/econ-v3c-spawn.html — embeddable via iframe, dark-mode aware, color-blind safe.

Code: zivtech/claude-cost-helpers on GitHub. GPL-3.0-or-later licensed.