How it works

Token waste comes in three kinds. Most tools fix one of them. Token Optimizer covers all three, and because it protects your session through compaction, the savings stick instead of vanishing the moment auto-compact fires.

The three kinds of waste

Structural

A bloated CLAUDE.md, unused skills, duplicate system reminders, stale MEMORY.md, entries past line 200 the model never sees, dead MCP servers. Often the largest share in high-waste setups, and it compounds: a leaner prefix means a smaller cache-read bill on every turn that follows. Almost no other tool touches this.

Runtime

Verbose command output, oversized MCP results, and re-read files that flood your context mid-session. This is the slice proxy compressors handle, covering roughly 15 to 25% of your context on a good day.

Behavioral

The habits that quietly burn tokens: letting the cache expire, compacting too late, looping on a failing approach, running Opus where Haiku would do, switching models mid-session and killing your cache. Some show up live within a session; others only emerge across many.

Why fixing one layer is not enough

A command-output compressor covers 15 to 25% of your context on a good day. The other 75 to 85% sits in structural and behavioral waste, plus the 60 to 70% you lose on every compaction. Saving tokens on git status does not help if the next auto-compact wipes out the decision that made you run git status in the first place.

How each kind is handled

Structural waste is found by the audit. Run /token-optimizer once and it produces a per-component token breakdown across CLAUDE.md, skills, MCP servers, and MEMORY.md, then fixes what it safely can and recommends the rest. See Reading your first audit.

Runtime waste is cut in real time by active compression: delta re-reads return only what changed, structure maps replace large-file re-reads with a skeleton, and bash output compression strips verbose CLI output down to the essentials. See Active compression overview.

Behavioral waste is handled on two fronts. The in-session habits are caught automatically and in real time: loop detection breaks a retry loop before it burns more turns, quality nudges prompt a compact before decisions are lost, session continuity carries your work across compaction and crashes, and cache-safe Keep-Warm recovers prompt-cache re-write waste. The slower patterns that no single session can reveal are surfaced by Token Coach and Fleet Auditor, which read your history: quality trending down, sessions creeping longer, cache hit rates falling, cost per session climbing, and model switches that invalidate the cache. The day-to-day waste is fixed for you; the historical analysis is there when you want to go deeper. See Quality nudges and loop detection, Session continuity, and Token Coach.

What makes the approach different

Fully local, zero dependencies, zero telemetry

Pure Python standard library on Claude Code and Codex. TypeScript with zero runtime dependencies on OpenCode and OpenClaw. Nothing to pip install, no analytics endpoint, no phone-home. Every measurement is a local SQLite write to a file you own under your runtime home. You can inspect it, export it, or delete it.

Zero baseline context overhead

Token Optimizer runs as an external process. It does not inject always-on instructions into your context, and it does not add MCP overhead. Optional quality nudges and checkpoint hints are short, event-triggered messages that appear only when useful. Idle overhead stays at zero.

It opens the hood, not just the dashboard light

/context tells you the context is 73% full. Token Optimizer tells you which 12K are wasted on skills you never use, flags orphaned MEMORY.md topic files the model cannot see, checkpoints your decisions before compaction destroys them, and gives you a quality score that tracks how much the session has degraded, turn by turn.

The savings survive compaction

Active compression only pays off if the session survives compaction. Smart Compaction closes that loop: it checkpoints your decisions before auto-compact fires and restores them afterward. After a compaction, the model also knows what large tool outputs it already processed, so it does not re-read them from scratch. See The compaction problem.

Next steps

The compaction problem: why the savings have to survive auto-compact, and how they do.
Quality scoring: how degradation is measured turn by turn.
Prompt cache economics: why a resume past the cache window costs you a re-write.
Why install this first: the compounding problems that make startup overhead matter.