# 55 — Token observability and session visualization (token-optimizer and peers) (https://jackin.tailrocks.com/research/token-optimization/55-token-observability-and-visualization/)



# 55 — Token observability and session visualization (token-optimizer and peers) [#55--token-observability-and-session-visualization-token-optimizer-and-peers]

Volume III deep-dive on the **observability** layer, requested after the headroom/compression work (53, 54): analyze `alexgreensh/token-optimizer` and survey other tools that give *full per-token visibility* and *session visualization* — "see every token, all usage, visualize my latest session." This is a different problem from compression (53/54) and output brevity (caveman): it saves nothing directly but underwrites every other lever, which is exactly the dossier's standing position on measurement (record 18: "saves nothing, underwrites everything"). Research conducted 2026-06-15; sources in the ledger; stars are treated as noise in this niche per file 54 §A.

## TL;DR [#tldr]

* **token-optimizer is the productized form of this dossier's own measurement method.** It reads Claude Code's native **JSONL session transcripts** locally into SQLite (no proxy, no telemetry endpoint), and renders the exact decomposition the dossier built by hand — **per-turn input / output / cache-read / cache-write with spike detection** — as a single-file **web dashboard** (`localhost:24842`), a color-shifting **status line**, and a CLI audit. It is `tools/session_cost.py` plus a dashboard, quality grades, and a 30-day trend view.
* **It is cache-safe and zero-overhead by construction.** JSONL-only, on-device, "external process, no context injection" — so unlike a compression proxy it cannot bust the prompt cache or add prefix rent. That makes the *visibility* half pure negative-cost: the dossier's "measure before optimizing" rule (file 47) with a UI.
* **The one token class it still cannot show is thinking — and neither can any JSONL tool.** Claude Code redacts thinking from the transcript; output\_tokens is thinking + visible fused. token-optimizer visualizes the input/output/cache split (the actionable \~80% of the bill) but cannot directly graph the thinking-vs-visible slice the dossier had to *infer* via `count_tokens(visible)` (file 02). "Full visibility for every token" has this one structural blind spot on hosted Claude. token-optimizer gets closest with a heuristic **wasteful-thinking flag** (it warns when extended thinking exceeds \~2× output on small edits) — useful, but a flag, not a thinking-vs-visible *split*. A genuinely complete tool would pair JSONL parsing with a `count_tokens` pass over visible blocks to estimate the thinking slice; none surveyed does this (a gap worth a jackin' tool, given `tools/count_tokens.py` already exists). A second wall compounds it: JSONL records per-API-call `usage` *totals*, not per-token streams, so true token-by-token session visualization does not exist for Claude Code at all (see the next section).
* **Its optimization half re-implements levers the dossier already ranked.** Structure-map re-reads (95–99%) = the outline lever (file 51, locally −91%); read-cache dedup (180k→250-tok skeleton) = outline + observation masking (record 12); bash compression (\~10%, lossy) = hook filtering (record 20). Its **keep-warm pinger** is the dossier's graveyard #3 ("keepalive pingers" killed for the live loop) — token-optimizer correctly *scopes* it to resumed/TTL-expired sessions on **API billing**, the one case the kill didn't cover, but that case is marginal on a Max subscription (file 41).
* **Two adoption caveats for jackin'.** License is **PolyForm Noncommercial 1.0.0** (free for personal/research and small teams \<5 people / \<$20k-mo, but not a permissive bundle into a product); and its dollar figures assume **API/Vertex/Bedrock pricing**, while the operator is on a Max **subscription** where dollars-below-cap are sunk and the metric is tasks-per-cap (file 41) — so for the operator the *token/cache/quality* views are the value, not the dollar totals.
* **Verdict:** the JSONL-reading, no-proxy observability class (token-optimizer, ccusage, `/usage`, the dossier's `tools/`) is the right way to get session visibility — adopt it freely as the measurement front-end of the validation harness (31/51/53). Proxy- or OTel-based observability adds reach but also a moving part; prefer the local-transcript readers for a coding agent.

## What token-optimizer is [#what-token-optimizer-is]

| Field              | Value                                                                                                                                                                                                                            |
| ------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Repository         | [github.com/alexgreensh/token-optimizer](https://github.com/alexgreensh/token-optimizer)                                                                                                                                         |
| Pitch              | "Find the ghost tokens. Fix them. Survive compaction. Avoid context quality decay."                                                                                                                                              |
| Created / activity | 2026-02-26 (\~4 months); pushed 2026-06-15                                                                                                                                                                                       |
| Adoption           | 1,341 stars / **8 watchers** / 110 forks / 1 open issue (gh api 2026-06-15). The 168:1 star:watcher ratio is the same PR-inflation signal as the compression niche (file 54 §A) — modest here; rank by what it shows, not stars. |
| Language           | Python (Claude Code/Codex hooks) + TypeScript (`openclaw/dist/*`)                                                                                                                                                                |
| License            | **PolyForm Noncommercial 1.0.0** (auto-commercial for teams \<5 people / \<$20k-mo)                                                                                                                                              |
| Data source        | Claude Code native **JSONL transcripts**, parsed locally to **SQLite** — CompletionStart/End events, tool invocations, compaction markers. &#x2A;*No proxy, no telemetry endpoint.**                                             |
| Surfaces           | Web dashboard (`localhost:24842/token-optimizer`); terminal status line (green→red on quality decay); CLI (`/token-optimizer` audit, `/token-coach` 30-day trends, `quick` 10-second check)                                      |
| Platforms          | Claude Code (CLI + VS Code), Codex (CLI + Desktop), OpenClaw, OpenCode, Hermes (beta), GitHub Copilot (beta)                                                                                                                     |

### What it makes visible (the part the operator wants) [#what-it-makes-visible-the-part-the-operator-wants]

* **Per-turn breakdown:** input / output / cache-read / cache-write — with the cache-write line further split by &#x2A;*TTL (5-minute vs 1-hour)** and **spike detection** on context jumps. Numbers are **exact, read from the API-response `usage` object in the JSONL** (`docs/METHODOLOGY.md`: "the three input classes sum back to total billed input … this decomposition is exact"), not tokenizer estimates. This is the dossier's headline token-class split (32% cache-read / 29% cache-write / 20% thinking / 17% visible output / 2% uncached, file 00/02) rendered per turn — except thinking stays inside the output bar (see the blind spot below). The 5m/1h write split is the most granular cache decomposition of any tool surveyed.
* **Session metrics:** cache hit rate + TTL mix; cost across four pricing tiers (Anthropic API, Vertex Global/Regional, Bedrock); per-message cost paired with response expense; subagent cost (orchestrator vs worker); top-5 costliest prompts by response expense.
* **Historical trends (30-day):** quality degradation, session-duration creep, cache-hit-rate decline, cost-per-session climb.
* **Quality grades:** an S–F composite of Resource Health (context fill %, compaction depth, absolute waste) and Session Efficiency (stale reads, bloated results, decision density), with green/yellow/orange/red bands. This is a heuristic dashboard on the same signal the dossier treats as online-quality governance (file 47) and context rot (file 46 / Chroma).

### Its optimization half (secondary to visibility) [#its-optimization-half-secondary-to-visibility]

token-optimizer also ships an active-compression layer (v5): structure-map re-reads (95–99% on large code files), delta-mode re-reads (\~97%), read-cache dedup (a 180,000-token file becomes a \~250-token skeleton), bash compression (16 handlers, \~10%, "lossy by design"), smart-compaction decision checkpointing, quality nudges, loop detection, and an opt-in keep-warm cache pinger. Claimed monthly savings: $80–150 light / $300–600 heavy / $1,500–2,500 high-waste.

## How it maps onto the dossier [#how-it-maps-onto-the-dossier]

The important finding is that token-optimizer is **almost entirely a productization of techniques the dossier already measured and ranked** — which is a point in its favor (it is the dossier's instrument with a UI) and a reason to treat its novel claims skeptically.

| token-optimizer feature                      | Dossier equivalent                                           | Note                                                                                                                        |
| -------------------------------------------- | ------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------- |
| JSONL→SQLite per-turn token decomposition    | Record 18 (ccusage/`/usage`/JSONL) + `tools/session_cost.py` | Same method, richer per-turn UI; **cache-safe, no proxy**                                                                   |
| Input/output/cache-read/cache-write bars     | File 00/02 decomposition (32/29/20/17/2)                     | Visualizes exactly what the dossier measured by hand                                                                        |
| Structure-map re-reads 95–99%                | File 51 (outline/symbol, local −91%/−98%)                    | Same outline lever, productized for re-reads                                                                                |
| Read-cache dedup (skeleton on re-read)       | Record 12 (observation masking) + file 51                    | Same "don't re-send the whole file" lever                                                                                   |
| Bash compression \~10% (lossy)               | Record 20 (hook filtering, local −94.2% on logs)             | Narrower than the dossier's grep-hook ceiling                                                                               |
| Smart-compaction decision checkpoint         | Record 06 (compaction) + file 46 (microcompact)              | Preserves decisions across compaction — sound                                                                               |
| Keep-warm cache pinger (opt-in, API billing) | **Graveyard #3 (keepalive pingers, killed)**                 | Killed for the live loop; scoped here to resumed/TTL-expired API-billed sessions — marginal on a Max subscription (file 41) |
| Quality grades / context-rot nudges          | File 47 (online quality) + file 46 (Chroma context rot)      | Heuristic dashboard on a real signal                                                                                        |

### The thinking blind spot (load-bearing) [#the-thinking-blind-spot-load-bearing]

The dossier's central measurement insight is that **thinking tokens are invisible in the Claude Code transcript** — they bill as output but are redacted, so the dossier had to infer them as `output_tokens − count_tokens(visible)` (file 02, 54.8% of output on the measured max-effort loop). Any tool that reads only the JSONL — token-optimizer, ccusage, `/usage` — inherits this blind spot: it can show the **output** bar but not split it into thinking vs visible. So "full visibility for every token" is true for input/cache classes and the output *total*, but the single largest lever the dossier found (thinking, the only one the effort parameter touches — file 09/15) is the one a transcript visualizer cannot directly graph. A genuinely complete token-visibility tool would pair JSONL parsing with a `count_tokens` pass over visible blocks to estimate the thinking slice; none of the surveyed tools does this today (a gap worth a jackin' tool, given `tools/count_tokens.py` already exists).

## Similar tools — full-visibility / session-visualization landscape [#similar-tools--full-visibility--session-visualization-landscape]

**First, the honest ceiling on "every token."** No tool visualizes a session literally token-by-token, color-coded — that does not exist for Claude Code, because the transcript records per-API-call `usage` *totals*, not per-token streams. The only token-by-token tools are tokenizer playgrounds (e.g. Simon Willison's Claude token counter over the `count_tokens` API) that have no notion of a session. So "full token visibility" in practice means **exact per-message and per-session input / output / cache-read / cache-write breakdown** — which the JSONL `usage` object gives exactly — plus the output total with thinking fused in (the blind spot above). That realistic best case is what the tools below deliver; ranked by what is actually visible and verifiable, not stars (inflation is rampant here: `daaain/claude-code-log` 547:1, the anchor 167:1, `phuryn` 114:1 star:watcher).

| Tool                                                 | Stars/watch · license                  | Granularity                                       | Cache / I/O split                                            | Surface                            | Data source                                            | Live?             | CC-native?                                                          |
| ---------------------------------------------------- | -------------------------------------- | ------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------- | ------------------------------------------------------ | ----------------- | ------------------------------------------------------------------- |
| **nateherkai/token-dashboard**                       | 584/11 · MIT                           | message / session / day / project                 | **Yes** (input/output/cacheRead)                             | Web (`localhost:8080`)             | JSONL `~/.claude/projects/`                            | Yes (30s)         | **Yes** — pure observability; dedupes CC's 2–3× JSONL stream-writes |
| **phuryn/claude-usage**                              | 1,826/16 · MIT                         | message / session / day / model                   | **Yes** (input/output/cache\_creation/cache\_read)           | Web + **VS Code sidebar** + CLI    | JSONL                                                  | Yes (30s)         | Yes — discloses $ is API-equivalent, not subscription               |
| **alexgreensh/token-optimizer** (anchor)             | 1,341/8 · PolyForm-NC                  | per-turn / message / session / 30-day             | **Yes + 5m/1h TTL write split**                              | Web + status line + CLI            | JSONL → SQLite                                         | Yes (status line) | Yes — richest split; but an optimizer, not pure viz                 |
| **ColeMurray/claude-code-otel** + official Grafana   | \~441 · MIT                            | session / day / model (aggregated)                | **Yes** (OTel `type` = input/output/cacheRead/cacheCreation) | Grafana + Prometheus               | **Claude Code native OTel** (no proxy)                 | Yes               | Yes — org-grade, counters only (no per-message drilldown)           |
| **delexw/claude-code-trace**                         | 311/1 · MIT                            | message + tool calls                              | partial (counts where available)                             | Desktop (Tauri) + Web + TUI        | JSONL                                                  | Yes (live tail)   | Yes — best *session replay*; token surface minimal                  |
| **jhlee0409/claude-code-history-viewer**             | 1,598/4 · MIT                          | message / session; cross-tool                     | token usage; cache split not emphasized                      | Desktop app                        | JSONL (CC + Codex/Cursor/Gemini/Cline/Aider/OpenCode)  | Post-hoc          | Yes (multi-agent) — browser, token secondary                        |
| **dabitk/claude-code-token-visualizer (cctv)**       | 0/0 · MIT                              | per-request → time buckets                        | input/output; cache hit-rate                                 | **Terminal TUI** (live histograms) | tails `.jsonl`                                         | **Yes (live)**    | Yes — real but unproven adoption                                    |
| **ccusage**                                          | 16,080★ · MIT                          | day / session / model totals                      | totals only                                                  | CLI                                | JSONL                                                  | Yes               | Yes (record 18)                                                     |
| **`/usage`*&#x2A; + &#x2A;*`tools/session_cost.py`** | first-party / in-repo                  | by skill/subagent/plugin; the 32/29/20/17/2 split | yes                                                          | in-CLI / script                    | built-in / JSONL                                       | —                 | Yes — the baseline instruments (record 18)                          |
| **Langfuse / Helicone / Phoenix / OpenLLMetry**      | 5.8k–29k★ · mixed (MIT/Apache/Elastic) | per-call / per-trace span                         | yes *if you instrument*                                      | self-host web                      | **you instrument** (SDK/OTel); Helicone is a **proxy** | Yes               | **No — not CC-native**                                              |

**Verified OTel detail (no proxy):** Claude Code natively emits `claude_code.token.usage` with a `type` attribute valued input / output / cacheRead / cacheCreation, plus `model` / `user` / `team` / `skill.name` / `plugin.name` / `agent.name`, and a separate `claude_code.cost.usage` (USD). So the Grafana path gets the cache-read/write split for free — but only as aggregated counters (no per-message, no per-token).

The pattern mirrors file 54's compression sweep: **the JSONL-reading, no-proxy class is the safe default for a coding agent** (token-optimizer, nateherkai, phuryn, ccusage, `/usage`, `tools/`) — it cannot perturb the request or add prefix rent. The native-OTel→Grafana path is the clean org-grade option (also no proxy) but coarser. The general platforms (Langfuse/Helicone/Phoenix/OpenLLMetry) are powerful but observe apps *you instrument yourself*, are not Claude-Code-native, and the proxy variants (Helicone, reportedly in maintenance mode) reroute the base URL — a caching and availability risk. Use those only if you are also building your own Claude API app.

### Best for "visualize my latest session" [#best-for-visualize-my-latest-session]

1. **nateherkai/token-dashboard** — the *pure-observability* pick for this brief: per-prompt→session→day with input/output/cacheRead split, heatmaps, subagent attribution, local web UI, no proxy, no compression side-effects, and it correctly dedupes Claude Code's 2–3× JSONL stream-writes (an accuracy point most miss). MIT. Does only visibility — exactly what was asked.
2. **phuryn/claude-usage** — closest runner-up; cleanest cache\_creation/cache\_read separation plus a VS Code sidebar; honestly flags that its dollars are API-equivalent (the file-41 subscription caveat, disclosed by the tool itself). MIT.
3. **token-optimizer (anchor) — observability layer only** — the *richest* numbers (cache-write 5m/1h TTL split, exact-from-`usage`-object, four pricing tiers, wasteful-thinking flag, live status line). Third *for this brief* only because it is an optimizer that also rewrites reads/compaction; #1 on raw capability if you want (or don't mind) the active features. Source-available (PolyForm-NC), not OSI.
4. **claude-code-otel + official Grafana** — best durable, queryable, org-grade dashboard on Claude Code's native OTel (no proxy); loses on grain (aggregated, no per-message).
5. **delexw/claude-code-trace** (replay across desktop/web/TUI, live tail) + **jhlee0409/claude-code-history-viewer** (multi-agent browser) — best for *navigating* the latest session as a conversation with tool calls; pair with #1 for replay + cost.

**Not for this goal:** Langfuse/Helicone/Phoenix/OpenLLMetry (instrument-your-own-app, not CC-native; Helicone needs a proxy); tokenizer counters (Simon Willison's, claude-tokenizer, lunary — per-text totals, not sessions); quota monitors (per-token split is not their job).

## jackin' fit [#jackin-fit]

* **Adopt the local-transcript observability class as the measurement front-end of the validation harness** (31, and the per-tool harnesses in 51/53). The pure-observability MIT options — `nateherkai/token-dashboard` and `phuryn/claude-usage` — and `tools/session_cost.py` answer the same question the harness needs ("where did the tokens go this session?") with no proxy and no prefix cost; token-optimizer's dashboard is richer but ships an optimizer alongside. All are pure negative-cost on the visibility axis — the file-47 "measure first" rule with a UI.
* **Mind the license and the metric.** PolyForm Noncommercial means token-optimizer is fine for an operator/researcher to run but is not a permissive dependency to bundle into jackin-core; and its dollar dashboards assume API/Vertex/Bedrock pricing, while the operator's Max subscription makes tasks-per-cap the real objective (file 41) — read the token/cache/quality panels, discount the dollar totals.
* **Close the thinking blind spot in jackin' tooling.** The highest-value local addition is not another dashboard but a `count_tokens`-backed thinking-vs-visible estimator layered on the JSONL reader (the dossier already ships `tools/count_tokens.py`) — the one token class no current visualizer shows.
* **Do not treat keep-warm as a default win.** It is the dossier's killed keepalive lever (graveyard #3), correctly scoped here to resumed/TTL-expired API-billed sessions; on a live Claude Code loop and on a subscription it is marginal-to-irrelevant. Leave it opt-in and measure it before trusting the projected dollars (the README itself calls them "history-replay estimates, not yet-realized dollars").

## Validation protocol [#validation-protocol]

To accept any visibility tool as the harness front-end: point it at a set of archived Claude Code sessions, and (a) reconcile its session token totals against `tools/session_cost.py` and `ccusage` within \<5% (the record-18 reconciliation bar), (b) confirm it reads transcripts only (no proxy, no `cache_control` mutation — diff usage fields with and without it running), and (c) verify its cache-read/write/output split matches the JSONL `usage` object turn-for-turn. Treat its quality grades and projected savings as heuristics, not measurements, until A/B'd on the 31/51/53 harness at equal task success.

## Source ledger [#source-ledger]

All accessed 2026-06-15.

* token-optimizer repo + README (features, data source, UI, limitations, PolyForm-NC license): [github.com/alexgreensh/token-optimizer](https://github.com/alexgreensh/token-optimizer)
* token-optimizer stats (1,341★ / 8 watchers / 110 forks / created 2026-02-26): `gh api repos/alexgreensh/token-optimizer`; source tree (`session-parser.js`, `dashboard.js`, `jl-sketcher.js`, `pricing.js`, `quality.js`, `read-cache.js`, `smart-compact.js`, `drift.js`) confirms the JSONL-parse → dashboard architecture; exactness + Measured-vs-Estimated split from `docs/METHODOLOGY.md`
* similar visibility / session-visualization tools (verified via repo READMEs + `gh api`): [nateherkai/token-dashboard](https://github.com/nateherkai/token-dashboard), [phuryn/claude-usage](https://github.com/phuryn/claude-usage), [ColeMurray/claude-code-otel](https://github.com/ColeMurray/claude-code-otel), [delexw/claude-code-trace](https://github.com/delexw/claude-code-trace), [jhlee0409/claude-code-history-viewer](https://github.com/jhlee0409/claude-code-history-viewer), [dabitk/claude-code-token-visualizer](https://github.com/dabitk/claude-code-token-visualizer); general platforms [langfuse](https://github.com/langfuse/langfuse), [Helicone](https://github.com/Helicone/helicone), [Arize Phoenix](https://github.com/Arize-ai/phoenix), [OpenLLMetry](https://github.com/traceloop/openllmetry); tokenizer counters (per-text, not session) [simonw/tools](https://github.com/simonw/tools)
* Claude Code native OTel metric/attribute schema (`claude_code.token.usage` type = input/output/cacheRead/cacheCreation; `claude_code.cost.usage`): [code.claude.com/docs/en/monitoring-usage](https://code.claude.com/docs/en/monitoring-usage); official Grafana dashboard 25255
* dossier cross-references: measurement method and ccusage/`/usage` — [`03-prior-art-and-market-scan.md`](/research/token-optimization/03-prior-art-and-market-scan/) (record 18); the token-class decomposition and thinking-invisibility — [`02-baseline-audit.md`](/research/token-optimization/02-baseline-audit/) and [`00-executive-summary.md`](/research/token-optimization/00-executive-summary/); keepalive graveyard #3 — `00`/`20`; outline lever — [`51-code-intelligence-tools.md`](/research/token-optimization/51-code-intelligence-tools/); subscription/quota metric — [`41-subscription-and-quota-economics.md`](/research/token-optimization/41-subscription-and-quota-economics/); online quality + context rot — [`47-meta-cost-governance-and-online-quality.md`](/research/token-optimization/47-meta-cost-governance-and-online-quality/) and [`46-fresh-literature-and-market-delta.md`](/research/token-optimization/46-fresh-literature-and-market-delta/); runnable instruments — [`tools/`](/research/token-optimization/tools/)
* compression companions: [`53-headroom-and-context-compression.md`](/research/token-optimization/53-headroom-and-context-compression/), [`54-context-compression-literature-and-market.md`](/research/token-optimization/54-context-compression-literature-and-market/)
