By Alex Urevick-Ackelsberg, Zivtech

Part 4: The Compact Gamble

Auto-compact is lossy summarization at peak context. Sometimes it drops the one constraint you needed.

Part 1 was about gaps. Part 2 was about sessions that go too long. Part 3 was about context bloat from file reads. This one is about what happens when Claude tries to fix that bloat for you.

At some point in a long session, Claude Code auto-compacts: summarizes the conversation, drops the original messages, continues with a shorter version. Sometimes it preserves what matters. Sometimes it drops the one constraint you needed and the next 10 turns go sideways.

You don't review the summary before it replaces your conversation. You can't undo it. And it fires at peak context.

Problem #4: Lossy Summarization at Peak Context

Auto-compact triggers when context approaches the window limit, meaning it fires at maximum context size: longest, most complex, most nuanced. The summarizer compresses the entire conversation into a fraction of its length, deciding what's important without knowing what you'll need next.

It keeps the obvious: files changed, current task, last decision. It drops what seems redundant: a constraint from turn 3, a failed approach you want to avoid, a specific number from a requirements discussion.

Sometimes that "redundant" constraint was the one thing keeping the implementation correct.

What I Was Doing

The lost constraint. Drupal migration. At turn 8, I told Claude the target database had a varchar(128) column limit, not varchar(255). Twenty turns later, auto-compact fired. The summary dropped that constraint (one sentence in a long session). Claude proceeded with varchar(255) mappings for 10 turns. I caught it when the migration failed in staging.

Those 10 turns weren't just wasted time. They were wasted tokens: every turn carrying implementation work built on a dropped constraint. Undo and redo cost another 15 turns.

The timing problem. Auto-compact fires at peak context. Summarization runs against the largest, most expensive conversation state, and the summary is output tokens ($75/MTok). The next message triggers a full cache write. The pre-compact cache is gone. No free transition.

Drag the slider. Real compactions are not uniform — one load-bearing decision lost counts more than ten redundant ones kept.

The Numbers

The compact operation itself: a 5,000-token summary at $75/MTok output is $0.375, plus the full context read to generate it, plus a post-compact cache write at $18.75/MTok. Call it $1-2 per event.

The real cost is downstream. In the Drupal migration, 10 turns of wrong work plus 15 turns of redo cost roughly $8-12 in wasted tokens. A developer who hits one bad compact per week: $400-600/year in wasted-work costs alone.

The fidelity slider in the visual makes the gamble concrete. At 80-90%, compact is a clear win. Below 60%, you're likely to lose something that matters. You never know which number you got.

What I Do Now

Compact on my terms. When a session gets long, I save with /save-session and start fresh. My handoff note is a curated summary: constraints, decisions, what a cold reader needs. No exploration dead ends, no verbose tool output. A 3,000-5,000 token handoff that's higher fidelity than auto-compact because I know what matters.

Never reach the threshold. If you manage session length (Part 2) and delegate exploration (Part 3), you shouldn't hit the limit. Auto-compact is a safety net. If it fires regularly, the upstream habits aren't landing.

When compact does fire, verify constraints immediately. Check for key facts: numbers, limitations, architectural decisions, anything stated early in the session. One turn re-establishing constraints is cheaper than 10 turns built on a gap.

Why This Surprised Me

Compaction sounds like it's saving you money. Sometimes it is: a bloated session burning $1+/turn compressed to $0.15/turn is a real win. The problem: you can't tell which kind of compact you got until something goes wrong.

There's no undo. The pre-compact conversation is gone. If the summary dropped something, the only recovery is re-stating it from your own memory. You have to remember what you told Claude 30 turns ago.

The compact-safety helper: make the gamble visible

The companion helper installs a PreCompact hook that fires before auto-compact. It writes a snapshot of conversation state to a local file (turn count, estimated tokens, key file paths) and warns you. Not an undo, but a checkpoint you can reference if post-compact behavior drifts.

git clone https://github.com/zivtech/claude-cost-helpers
cd claude-cost-helpers/compact-safety
./install.sh

The hook can't prevent compact. It makes the event visible. If Claude starts making wrong decisions after compact, check the snapshot for dropped constraints.

The real fix is upstream: split sessions before they reach the limit.

The ecosystem fix: progressive compaction

Our hook is the 30-second safety net. For progressive compaction with quality scoring, pair it with token-optimizer: 7-signal quality scoring, progressive checkpoints, automatic context restoration, and an HTML analytics dashboard.

claude plugin install alexgreensh/token-optimizer

Our hook writes a marker before compact. Token-optimizer tracks quality signals proactively, catching degradation before it becomes an emergency. Simple version: our hook. Full solution: their plugin.

At the fleet level

If compact is firing, the upstream habits (session splitting from Part 2, delegation from Part 3) aren't landing.

Joyus AI Internal's cache-economics module (spec 011) tracks totalInputTokens and messageCount per session. Sessions approaching the window limit are visible in the dashboard before compact fires. Compaction is a Claude Code runtime behavior Joyus doesn't control, but the conditions that cause it are exactly what the cost dashboard surfaces.

The Rule

If auto-compact is firing, your sessions are too long. Split earlier. When compact does fire, immediately verify that your key constraints survived. One turn of re-stating what matters is cheaper than ten turns of wrong work.

Coming Up

Part 5: The watching cost — tool output is permanent in context
Part 6: The delegation tax — subagent results are permanent too

Visual: compaction fidelity slider at visuals/econ-mini-04-compact-loss.html — embeddable via iframe, dark-mode aware, color-blind safe.

Code: zivtech/claude-cost-helpers on GitHub. GPL-3.0-or-later licensed.

Part 3: The Agent That Read 200 Files

Part 5: The Watching Cost