By Alex Urevick-Ackelsberg, Zivtech

Part 5: The Watching Cost

Tool output is permanent. Every dump gets reprocessed on every turn for the rest of the session.

Part 1: gaps. Part 2: sessions too long. Part 3: file bloat. Part 4: compaction dropping constraints. This one: tool output you never look at again but keep paying for on every turn.

Problem #5: Tool Output That Lives Forever

When Claude runs a tool (test suite, build, grep, API call), the full output lands in the conversation. Not a summary. Raw output: every line, every stack trace, every passing test message.

That output stays in context for the rest of the session, reprocessed on every turn at cache-read rates. A 10,000-token test log from turn 8 is still there at turn 40, costing $0.015 per turn ($1.50/MTok x 10K tokens). You ran the test once. Looked at the result once. Paying for it 32 more times.

Every dump adds to the stack. Every turn rereads the full stack.

What I Was Doing

Build logs. A typical npm or Drupal build dumps 5,000-20,000 tokens: progress messages, deprecation warnings, dependency resolution noise. The two lines that matter (success/failure and maybe an error) are buried in thousands of tokens I'll never reference again.

Test suites. Same pattern, worse magnitude. A 100-test PHPUnit run can produce 15,000-30,000 tokens. I need the failures. I don't need the 95 passes. All of it is in context, permanently.

Exploratory greps. "Search for all uses of this function." Claude finds 47 matches across 30 files. Full output: file paths, line numbers, surrounding context. Maybe 8,000 tokens. I need three files. The other 44 matches are dead weight for the rest of the session.

Drag the slider. A 10K-token dump costs $0.30 in warm-cache carrying cost over 20 turns — up to $3.00 with cold starts from Part 1. A 50K dump: $1.50 to $15.00. And that's one dump — most sessions have several.

The Annual Math

Two moderate tool dumps per day (build log + test run, 10K tokens each), reprocessed on 20 subsequent turns: 200K tokens per dump, 400K per day, 100M per year.

At cache-read rates ($1.50/MTok): ~$150/year per developer in pure carrying cost for output that was useful once. Mix in cold-cache turns from Part 1 (most sessions hit at least one) and the effective rate climbs toward $15/MTok, pushing the annual figure toward $1,500. Real number depends on cache discipline.

Larger dumps push it higher. 30K-token build logs: $450-$4,500/year depending on cold-cache frequency. And this doesn't count tool output from Claude's own calls (file reads, bash results, search results), which follow the same pattern.

The Sync Tool Call Variant

Connects back to Part 1. When Claude kicks off a long synchronous tool call (test suite, build, heavy analysis), it waits. While it waits, the cache ages. If the tool takes more than five minutes, the cache expires. That result-processing turn becomes the most expensive in the session: full re-cache of the entire conversation, plus the new tool output on top.

The cache doesn't care that Claude is busy. Five minutes of waiting is five minutes of cooling.

What I Do Now

Run long tasks in CI, not in chat. Test suites, builds, deployments. CI runs the job; Claude reads the result artifact when done. One file read instead of thousands of tokens permanently in context.

Use --output flags and file redirection. Redirect tool output to a file. Claude reads it if needed: one artifact in one place, not a streaming dump.

Ask for filtered output. "Run the tests and show me only failures." "Search for X and return only file paths." Claude filters well when asked. Unfiltered is the default only because nobody asked for concise.

Use run_in_background for long tools. The tool runs asynchronously; Claude gets notified on completion instead of sitting idle. Avoids the sync-tool-call cache death above. Output still lands in context, but the cache didn't die waiting.

Why This Surprised Me

Tool output feels like Claude's problem. I didn't put it there. But it's my context paying the carrying cost. Unlike a file I deliberately read, tool output accumulates as a side effect of normal work.

Tool output is often the least information-dense content in a session. Build logs are 95% noise. Test output is 95% passes. Grep results are 90% irrelevant matches. Tokens don't care about information density. Every token costs the same whether it's a critical error or the 47th passing test.

Of the six cost drivers, this one is the most invisible. The idle tax (Part 1) is visible if you look. Context rot (Part 2) correlates with session length. File bloat (Part 3) is tied to actions you took. Compaction (Part 4) is a detectable event. Tool output carrying cost is background radiation: always there, always adding up, never visible in any single turn.

The watching-cost helper: make the dumps visible

The companion helper installs a PostToolUse hook that tracks token sizes of tool output landing in your context. Warns when a single result exceeds a threshold (configurable, defaults to 5K tokens) and when cumulative tool-output tokens cross escalating thresholds.

git clone https://github.com/zivtech/claude-cost-helpers
cd claude-cost-helpers/watching-cost
./install.sh

Can't remove output already in context. Makes the accumulation visible so you can adjust: filter future calls more aggressively, redirect to files, or split the session.

The ecosystem fix: output compression

Our hook detects bloat after it happens. For prevention, pair with RTK (28K+ stars): a Rust CLI proxy that compresses Bash command output 60-90% transparently via a PreToolUse hook.

brew install rtk-ai/tap/rtk

Limitation: RTK compresses Bash output only, not Claude Code's built-in tools (Read, Grep, Glob, Agent). Our hook monitors all tool output. RTK prevents Bash bloat; our hook catches everything else.

At the fleet level

Joyus AI Internal's cache-economics module (spec 011) captures outputTokens per operation and in session roll-ups. Verbose outputs show as disproportionate output-to-input ratios, easy to spot across a team's sessions. Spec 011's idle-gap detection treats blocking tool calls the same as human absence: both produce long gaps and elevated re-cache cost. The platform surfaces the pattern. The fix (filtering, redirecting, running in CI) is still on the developer.

The Rule

Every tool dump stays in context until the session ends. If the output is more than a few thousand tokens and you only need a fraction of it, redirect, filter, or delegate. You'll pay for whatever lands in context on every turn that follows.

The Series

Six cost drivers, six habits, six helpers:

The Idle Tax — gaps kill the cache. Save and restart.
The "Just One More Turn" Trap — context rot is superlinear. Split at task boundaries.
The Agent That Read 200 Files — exploration belongs in subagents. The parent gets the summary.
The Compact Gamble — auto-compact is lossy. Don't let it make the call.
The Watching Cost — tool output is permanent. Filter, redirect, or delegate.
The Delegation Tax — subagent results are permanent too. Constrain, split, or write to file.

All six helpers are in claude-cost-helpers. Install whichever ones match your workflow, or install all of them — they're independent hooks that don't conflict.

The common thread across all six: the expensive thing is never the prompt you write. It's the context you carry. Every habit in this series is about the same underlying principle: keep the context lean, keep the cache warm, and when a session has served its purpose, let it go.

Visuals: tool dump reprocessing slider at visuals/econ-mini-05-output-in-context.html, synchronous tool call timeline at visuals/econ-v3d-toolcall.html — embeddable via iframe, dark-mode aware, color-blind safe.

Code: zivtech/claude-cost-helpers on GitHub. GPL-3.0-or-later licensed.

Part 4: The Compact Gamble

Part 6: The Delegation Tax