Part 1: The Idle Tax
Cache gaps cost 12.5×. Every time you step away, the meter resets.
I've been using Claude Code daily for about a year. It's the most productive tool I've ever picked up. It also has costs that are almost entirely invisible unless you go looking for them, and they're not where most people expect.
This series is about that gap: between what Claude Code costs and what you think it costs. Not the pricing page, but the habits, the rhythms, the moments that add up to real money over a normal workweek.
The One Thing You Need to Know About Pricing
Quick primer, then we move on. Claude Code charges per token — chunks of text going in and out. But there's a critical detail: prompt caching. Claude caches the beginning of your conversation so it doesn't reprocess it every turn. Cache reads cost 90% less than processing those same tokens fresh.
That cache has a 5-minute TTL. No activity for 5 minutes and it evaporates. The next message reprocesses everything from scratch — roughly 12.5× more expensive than the same message with a warm cache. Same tokens, same conversation, same work. Just a different price because the gap was too long.
There's no warning. The cache quietly dies.
The Six Cost Drivers
Over months of running Claude Code across two businesses (Zivtech, a dev agency, and Milk Jawn, an ice cream manufacturer), I've found six habits that drive most of the avoidable spend. None of them are about writing better prompts.
1. The Idle Tax — Cache goes cold when you step away or juggle sessions. Every return costs 12.5x what it should. This post.
2. The "Just One More Turn" Trap — A session that should've been split keeps going. Past the knee, the model gets less reliable and more expensive.
3. The Agent That Read 200 Files — File reads in the parent session live in context forever. A subagent returns only a summary.
4. The Compact Gamble — Auto-compact summarizes your session at peak context. Sometimes it drops the one constraint you needed.
5. The Watching Cost — Tool output (build logs, test runs, grep results) gets reprocessed on every subsequent message.
6. The Delegation Tax — Subagent results are permanent in the parent context. The agent's window is disposable; its result is not.
The common thread: every one of these is invisible while it's happening. You see it in the bill, if you know what to look for.
Problem #1: The Idle Tax
Some tasks in Claude take real time. Reading files, running tests, working through a complex implementation. If I just sit and watch, I'm slower than before AI. The whole point is that Claude works while I do other things.
So I multitask. And that's where the money goes.
What I Was Doing
Two flavors, both common.
Getting pulled into something else. Mid-session on a Drupal client project. Claude is reading files, working through an approach. I check Slack, answer an email, take a call from our production manager at Milk Jawn. Fifteen minutes gone. Come back, type the next prompt. Cold cache. 12.5x.
Running four chats because that's the only way to stay productive. Multiple Claude sessions across two laptops, different projects, different contexts. I rotate through them every couple of minutes to keep them warm. Then one session demands actual thinking: a careful prompt, files I need to read, a problem I have to sit with for 15 minutes. While I'm focused there, the other three age out. WARM, COOLING, COLD. The next message to any of them pays the penalty.
Six or seven cold-cache hits a day, multiplied across the week, and the math gets ugly. The worst part: the work in each session was efficient. I was doing exactly what you're supposed to do.
What I Do Now
When I'm about to step away for more than three minutes, or when one session demands deep focus, I save the others and close them. /save-session captures the state into a handoff note. Window goes away.
When I come back, I open a fresh session, paste the handoff, and keep moving. The first message pays the full setup cost (same as I'd have paid returning to a cold cache). The difference: every message after the first is warm. I'm not paying 12.5x on turn after turn.
Sounds annoying. After two days it's reflex.
Why This Surprised Me
Any single instance is small. A few cents, maybe a dollar on a large context. You'd never look at one bill and spot it.
It's the frequency. "Just keep your sessions warm" is useless advice because the whole point of the tool is that it works while you do other things. You have to multitask or you're slower than before AI. Multitasking means gaps. Gaps mean cold caches. Every transition is a tax I didn't know I was paying.
What Changed
My workflow wasn't always this expensive. Before March 2026, the prompt cache TTL was one hour, not five minutes. I could walk away for 20 minutes, rotate through four sessions at a leisurely pace, and the cache was still warm. I built my entire workflow around that.
Then three things changed in the same month:
- Around March 6, Anthropic dropped the cache TTL from 1 hour to 5 minutes. This was intentional and server-side. A detailed analysis on GitHub tracked it day by day across months of session data — 1-hour TTL through February, 5-minute TTL from March 8 onward. One analysis found a 17–26% cost increase and subscription users started hitting their 5-hour quota limits for the first time.
- Around March 15, a client-side cache bug (GitHub #34629) broke caching in resumed sessions — only the system prompt was cached, while the full conversation was re-uploaded from scratch on every message. This was fixed in v2.1.97.
- On March 27, the off-peak 2× usage promotion expired and Anthropic added peak-hour throttling the same week — quota drains faster during business hours now.
Then a month later, a fourth change landed:
- On April 16, Opus 4.7 shipped with
xhighas its new default effort level — a level that didn't even exist on 4.6. Higher effort means more thinking tokens spent per turn, on every turn. And the model overrides whatever you had configured before. From the model config docs: "When you first run Opus 4.7, Claude Code appliesxhigheven if you previously set a different effort level for Opus 4.6 or Sonnet 4.6." Translation: any/effort highoreffortLevel: "high"you'd configured gets silently overridden the first time 4.7 spins up. The TTL didn't get fixed. The per-turn floor went up.
The xhigh default is the kind of change that doesn't show up as an alert. There's no "your effort level changed" notification. The statusline quietly reads "with xhigh effort" and you keep working. I'd had effortLevel: "high" in my settings.json for months. The first 4.7 session ignored it. I had to re-run /effort high manually before I noticed.
There's exactly one mechanism that survives the override: the CLAUDE_CODE_EFFORT_LEVEL environment variable. It takes precedence over /effort, over the effortLevel settings field, over the model's first-run rule — over everything. Set it once in ~/.claude/settings.json's env block and the pin actually sticks across sessions and model switches. That's the only configuration path I'd trust on 4.7.
I didn't suddenly start working wrong. The economics shifted four times in five weeks and my habits didn't shift with them. If you started using Claude Code before March and your usage feels more expensive now, this is why. The rest of this series is what I'm doing about it.
The idle-tax helper: make the tax visible
Rather than just writing about this, we open-sourced the fix: claude-cost-helpers — a set of local hooks and slash commands that make these cost mechanics visible.
The idle-tax helper installs a UserPromptSubmit hook that warns you when the cache has gone cold (or is about to), paired with /save-session and /resume-session commands for the save-close-reopen cycle. It's a bash script, zero dependencies, takes about 30 seconds to install.
git clone https://github.com/zivtech/claude-cost-helpers
cd claude-cost-helpers/idle-tax
./install.sh
The hook is informational — it warns but never blocks. Your prompt always proceeds. The point is to make the cost legible at the moment it happens, so you have a real choice: continue here (sometimes that's right) or save and start fresh (often cheaper at scale).
If you're on Opus 4.7, install the effort-control helper at the same time. It sets CLAUDE_CODE_EFFORT_LEVEL=high in your settings env block (the only mechanism that survives 4.7's first-run override), plus a SessionStart banner and a /deep command for one-shot escalation when a task warrants it.
One helper per post, six hooks covering the six cost drivers, plus effort-control as the 4.7 patch.
At the fleet level
The local helper makes the idle tax visible to you. At team scale (10 developers, each juggling multiple sessions), you need the same signal across every session, every day.
Joyus AI Internal's cache-economics module (spec 011) adds idle-gap detection per mediation message, tracks cacheMissCount and maxIdleGapSeconds on the session record, and recommends session splitting when both gap length and session size cross thresholds. Of the six cost drivers, the idle tax is the one Joyus is most directly built to manage: most frequent, most invisible, most fixable at the platform layer.
The Rule
Any Claude Code session you haven't touched in five minutes is already costing you. Either you're working in it, or it should be closed with a handoff.
That's it. Not "manage your context window." Not "use subagents strategically." Just: when your focus moves, move the session with it.
Coming Up
- Part 2: The "just one more turn" trap — context rot makes you slower and more expensive
- Part 3: The agent that read 200 files — subagents as cost containment, not just speed
- Part 4: The compact gamble — when summarization saves you, when it burns you
- Part 5: The watching cost — tool output is permanent in context
- Part 6: The delegation tax — subagent results are permanent too
If you've noticed your own cost patterns, I'd love to hear them. The point of this series is to surface the stuff that doesn't show up in the docs.
Visual: interactive cost calculator at visuals/econ-cost-calculator.html — embeddable via iframe with URL-param preload for per-post scenarios, dark-mode aware, color-blind safe.
Code: zivtech/claude-cost-helpers on GitHub. GPL-3.0-or-later licensed.