08 — Per-technique records, source ledger, and unverified-claims register
08 — Per-technique records, source ledger, and unverified-claims register
This page makes the hub self-contained as the single source of truth: it carries the formal per-technique records for all four tools (the structured schema the dossier uses), the full consolidated source ledger (every citation, with access dates), and an explicit unverified-claims register — the place where vague, vendor-only, or unreplicated numbers are kept (not deleted) and clearly marked "not proven," per the research mandate that no previously gathered information is lost. Nothing here references out for the underlying detail; the dossier chapters (10, 03, 53, 54, 56) now reference this hub.
Evidence tiers used throughout: T1 shipped + locally reproduced · T2 peer-reviewed · T3 community-replicated · T4 single-source vendor self-report, no replication. "Not proven" = T3-weak or T4 retained for completeness but not independently confirmed.
Per-technique records
These use the dossier's record schema (Layer / Mechanism / Expected savings / Evidence tier / Quality risk / Availability / Effort / Composability / Validation protocol). They are reproduced here so the hub holds the full structured detail, not just the prose teardowns in 01–04.
C1 — Register-ladder output compression (caveman)
- Layer: output (visible assistant prose).
- Mechanism: a Claude Code output-style / skill prompt makes the model drop politeness, preamble, and function words from visible replies; BPE token count tracks word count, so deleting words is what saves. Delivered as a session-resident instruction (output styles "directly modify Claude Code's system prompt"); changes take effect after
/clearand invalidate the cached system-prompt prefix, so do not toggle mid-session. - Expected savings: 50–67% of visible-output tokens (local register ladder); phase-0 session-level caveman-ultra 58.5%. On the modeled profile ≈ 10% of session dollars, hard-capped at the 17% visible-output share (ESTIMATE, arithmetic in 10).
- Evidence tier: T1 (local tokenizer method) + phase-0 session measurement + live product docs for the mechanism.
- Quality risk: NEGATIVE-COST to NEUTRAL at filler-strip/telegraphic rungs; RISKY at caveman-ultra / wenyan — no benchmark measures register-compressed agent output against task success; degradation = dropped caveats, skipped rationale, altered tool-call behavior. Failure to set
keep-coding-instructions: trueon a custom style silently drops the built-in coding instructions. - Availability:
CLAUDE-CODE-TODAY(~/.claude/output-styles/*.mdor the caveman plugin). Effort: minutes. - Composability: stacks with effort (different slice: style→visible, effort→thinking + tool calls) and with subagent delegation (cavecrew). Anti-synergy: mid-session style switches break the prompt cache.
- Validation protocol: 20 fixed repo tasks (10 bugfix, 10 refactor), default vs telegraphic style, same effort, fresh sessions; record
usage.output_tokens, visible-block tokens (split thinking viacount_tokens), tool-call count, tests-pass, and a blinded completeness rating. Pass = visible tokens −45% or better, tests-pass within noise, tool-call count ±10%. Run once at caveman-ultra to price the extra rung.
H1 — Live-zone input compression (headroom) — the cache-safe design point
- Coverage-delta: refines the dossier's record 19 / file 46 FL3, which had treated input compression as monolithically cache-hostile.
- Layer: input + cache.
- Mechanism: split each request into a stable prefix and a volatile live zone; stabilize the prefix (extract volatile content to a tail, normalize tool definitions, insert
cache_controlat stable boundaries) and compress only the live zone, once, before it is first cached. The cached prefix stays byte-identical, so 0.1× reads survive; the compression shrinks the cache write of the new content and all future reads of it. - Expected savings: on the modeled day,
(compressible-observation share of the 61% cache bucket) × compression% × (write-share + 0.1×read-share)— real on the largest bucket, bounded to low-double-digit % of dollars. NOT the 60–95% per-payload headline (ESTIMATE). - Evidence tier: T1 for the mechanism (underlying log/outline/minify levers locally reproduced); T3-weak for headroom's product numbers; T2 academic backing for the write-time pattern (Squeez arXiv 2604.04979 — 92% tool-output removal at 0.86 recall; AgentDiet arXiv 2509.23586 — Claude 4 Sonnet 64.5%→66.5% with input −40–60%, nets out its own +5–15% cost).
- Quality risk: NEUTRAL on rule-based transforms (log/JSON/search/diff); RISKY on the ML text compressor (
kompress-basecan drop identifiers/caveats); RISKY in proxy mode (silent cache-bust if the prefix churns). - Availability:
CLAUDE-CODE-TODAYvia MCP (headroom_compress) /SDK(library) /GATEWAY-OR-SELF-HOST(proxy). Effort: minutes (MCP) to hours (proxy + offline asset provisioning). - Composability: composes with prompt caching when scoped to the live zone; anti-synergy with proxy-mode-in-front-of-Claude-Code (double-stabilization).
- Validation protocol: 20 tool-heavy tasks, native vs headroom-MCP; from JSONL require
cache_readcontinuity preserved, tool-result tokens down, task success unchanged, net tokens-per-solved-task down ≥20% after subtracting MCP schema + retrieve round-trips.
H2 — Reversible compression with on-demand retrieval (CCR, headroom)
- Coverage-delta: new productization of "output brevity with quality gates" / progressive disclosure.
- Layer: input / retrieval.
- Mechanism: compressed content is stored verbatim in a CCR store (SQLite/Redis/in-memory); the model receives a compressed view plus a
headroom_retrievetool and can fetch the original within a TTL. Lossy compression becomes recoverable lossy compression. - Expected savings: the H1 saving minus the cost of retrievals actually triggered; net-positive only if retrieval rate is low (each retrieve is a tool-call round-trip).
- Evidence tier: T3 (mechanism shipped + vendor-benchmarked:
ccr_regression_benchmark.py,adversarial_ccr_tests.py); no independent net-effect measurement. - Quality risk: NEGATIVE-COST in principle (removes the lossy-memory "confidently-wrong recalled fact" failure mode) if the model reliably knows when to retrieve. Failure: trusts a compressed view it should have expanded, or over-retrieves and erases the saving.
- Availability:
CLAUDE-CODE-TODAY(MCP exposesheadroom_retrieve). Effort: minutes. - Composability: strengthens any lossy input/memory compressor; pairs with cross-agent memory (H4); orthogonal to caching.
- Validation protocol: detail-dependent canary suite (numbers, negations, "don't do X" buried in a compressed payload); require retrieve-or-correct behavior on 10/10 and net-positive tokens.
H3 — Failure-mining into memory files (headroom learn)
- Coverage-delta: new — no equivalent in the dossier.
- Layer: input (memory) / meta.
- Mechanism: analyze past failed sessions across Claude/Codex/Gemini and write durable corrections into CLAUDE.md/AGENTS.md, so the always-loaded prefix improves over time. A closed self-correction loop.
- Expected savings: indirect — fewer repeated failures = fewer wasted retry turns (the most expensive waste). No published number; the cost is added prefix mass (every CLAUDE.md line is cache-read rent on every call).
- Evidence tier: T4 (plausible mechanism, no measured net effect; failure-mining quality unverified). Not proven.
- Quality risk: RISKY — an auto-written rule that is wrong or over-general is one bad PR that erases months of savings; unbounded auto-append violates CLAUDE.md slimming.
- Availability:
CLAUDE-CODE-TODAY(CLI command). Effort: minutes to run; ongoing editorial discipline. - Composability: feeds the CLAUDE.md lever; anti-synergy with prefix slimming if left unbounded.
- Validation protocol: human-gate every correction; cap file size; A/B the failure rate on the targeted task class; confirm added prefix rent < retries prevented.
H4 — Cross-agent deduplicated shared memory (headroom)
- Coverage-delta: extends cavemem / claude-mem / cross-agent portability with a cross-tool, auto-dedup angle none of them cover.
- Layer: input (memory).
- Mechanism: a single store shared across Claude, Codex, and Gemini with automatic deduplication, so a fact learned in one agent is available once to the others instead of being re-derived per tool.
- Expected savings: unquantified by the vendor; the standing objection to all memory tools applies — no injection-cost-vs-re-exploration-saved accounting exists.
- Evidence tier: T4 (no net-accounting published). Not proven.
- Quality risk: RISKY — the lossy-memory failure mode plus a cross-agent blast radius (a bad memory now corrupts three tools). Reversibility (H2) mitigates but does not remove it.
- Availability:
CLAUDE-CODE-TODAY(MCP/library), genuinely useful only for multi-tool operators. Effort: minutes–hours. - Composability: competes with cavemem/claude-mem (pick one); pairs with CCR (H2).
- Validation protocol: week-long memory A/B across two agents, metering the store's own compression/injection calls against re-exploration avoided.
R1 — Deterministic write-time command-output compression (RTK)
- Coverage-delta: productizes hook/preprocessing filtering + JSON sampling as a turnkey binary; the deterministic, Claude-Code-native worked example of the cache-safe write-time-observation-compression class (which used headroom-MCP as its example).
- Layer: input + cache (write-time, at the tool boundary).
- Mechanism: a PreToolUse hook rewrites the agent's shell command to
rtk <cmd>; RTK runs it and returns filtered/grouped/truncated/deduplicated output. Because the compressed text is what enters context, the cache write shrinks and every later 0.1× read of it shrinks, and the cached prefix is never touched. - Expected savings:
(Bash-command-output share of the 61% cache bucket) × compression% × (write-share + 0.1×read-share)— bounded to low-double-digit % of dollars because most observation tokens already read at 0.1× and RTK reaches only Bash output. NOT the 60–90% per-command headline (ESTIMATE). - Evidence tier: T1 for the mechanism (filter/dedup/group/JSON-sample levers locally reproduced — log filter −94.2%, JSON minify −34.3%); T4 for RTK's specific product numbers (vendor self-report through its own
rtk gaincounter, no whole-session telemetry, no independent replication). Product numbers not proven. - Quality risk: NEUTRAL on dedup/grouping of genuinely redundant output; RISKY on truncation — a truncated-but-successful command can silently drop the one line the agent needed, and tee-recovery only fires on failure.
- Availability:
CLAUDE-CODE-TODAY(hook + binary). Effort: minutes to install; the real cost is reconciling its hook registration with caveman's and scoping it to a container. - Composability: stacks with caveman (different token class) and with code-intelligence outlines (different mechanism); diminishing returns (not anti-synergy) stacking headroom on the same Bash bytes; hook-registration anti-synergy with caveman's hooks (both write agent config).
- Validation protocol: 20 Bash-heavy tasks (test/build/git/log), native vs RTK-hook; from JSONL require (a)
cache_readcontinuity preserved, (b) tool-result tokens down, (c) task success unchanged, (d) no rise in command re-runs, (e) net tokens-per-solved-task down ≥20%.
L1 — Integrated context runtime (lean-ctx)
- Coverage-delta: new — the first superset data point in the hub. Spans R1's write-time-shell-compression class and headroom's H1 broad-input class and the persistent-code-graph lever the dossier's code-intelligence chapter tracks (which the three-way said none of caveman/headroom/RTK had), plus a verification layer (signed savings ledger, Context Proof) with no prior analog.
- Layer: input + cache + retrieval + memory + verification (everything except output).
- Mechanism: one long-lived Rust runtime that intercepts at three points — a shell hook (56 pattern modules, write-time), MCP
ctx_read(10 modes: tree-sitter signatures/map/entropy/lines, content-addressed ~13-tok handle re-reads), and an opt-in proxy (frozen-region prose rewrite). Below the read it maintains a persistent property graph (imports/calls/exports/type_ref), a BM25 index, and an opt-in dense-embedding index fused by RRF. A Context Field Theory layer scores each item by Φ (relevance + surprise + graph + history bandit − cost − redundancy) and a knapsack compiler selects views under budget pressure. Savings are bounce-netted and written to a tamper-evident SHA-256 ledger. - Expected savings: on code-read-heavy work, large (
map/signatures96–99% per code read, reproduced); on prose/config-heavy work, small (<10%). Whole-bill bounded identically to R1/H1:(compressible-read share of the 61%) × compression% × (write + 0.1×read)— NOT the "up to 99%" per-read/cache-handle headline (ESTIMATE). - Evidence tier: T1 for the mechanism (built from source and benchmarked this round: read modes, shell compression, BM25/graph search, bounce-netting all reproduce); T4 for the product percentages (self-measured on a GPT tokenizer, per-read/per-session best cases, no independent third-party benchmark, youngest project). Product numbers not proven externally.
- Quality risk: NEUTRAL on deterministic modes (signatures at 95.9% self-rated quality); RISKY on
map(97.8% compression at only 77% quality — drops structure; mitigated by the bounce tracker auto-upgrading modes); RISKY in proxy mode (lossy prose rewrite, headroom's risk); embeddings/proxy are opt-in so the default core avoids the ML risk. - Availability:
CLAUDE-CODE-TODAY(MCP + shell hook). Effort: minutes to install; the real cost is the footprint (64.7 MB binary, daemon, dashboard, SQLite stores) and reconciling host writes across up to 34 agent targets with the host-write ban. - Composability: stacks with caveman (output, different class); anti-synergy with RTK (two shell-rewrite paths over the same Bash bytes — run one); diminishing returns stacking headroom's proxy on lean-ctx's proxy; its code graph occupies the "prevent reads" layer a standalone code-intelligence MCP otherwise would.
- Validation protocol: 20 tasks split code-read-heavy vs prose/config-heavy, native vs lean-ctx (MCP + hook, no proxy); from JSONL require
cache_readcontinuity preserved, tool-result tokens down, bounce/re-read rate not worse (cross-check against lean-ctx's ownadjusted_total_saved), task success unchanged, net tokens-per-solved-task down ≥20% after the 77-tool schema rent. Run a second arm atmapmode to price the 77%-quality risk againstsignatures.
Unverified-claims register (kept, marked "not proven")
Per the mandate to preserve every previously gathered number even when it cannot be confirmed, these claims are retained here rather than deleted. Each is a real datapoint from the prior research; each is flagged with why it is not proven. None should be propagated as fact.
| Claim | Source | Why not proven | Honest reading |
|---|---|---|---|
| caveman "~75% of output tokens" | caveman README | pooled benchmark ratio on a tiktoken approximation vs an "Answer concisely." baseline | per-task mean 65%; local Claude-tokenizer replication 58.5% |
| caveman-compress "~46% input tokens" | caveman README | no independent replication | directional; instruction-side bills at cache-read rates (~50× less valuable/token) |
| cavecrew "~60% fewer tokens than vanilla" | caveman README | dossier measured −43.9% | use 43.9% |
| caveman-code "~2× fewer tokens than Codex" | caveman README | no methodology published | unverified |
| cavemem "~75% fewer prose tokens" | cavemem README | no benchmark table, one worked example | expect ~55–60% on stored prose (ESTIMATE); T4 |
| cavekit token claims | cavekit README | inherited from caveman encoding; no cavekit-vs-vanilla measurement | T4; the loop can increase total calls — net sign unknown |
| headroom "60–95% fewer tokens" | headroom README/docs | per-payload ratio; code/grep compress 0%; production median 4.8% | per-redundant-payload best case only |
| headroom "96.2% total savings" | headroom docs | double-counts caching Claude Code already banks | incremental lever is the live-zone fraction only |
| headroom "$700K saved / 90% redundant" | Register article (2026-05-31), echoed by ~20 outlets | press repeated a vendor figure; no independent test | T4 marketing |
headroom learn / cross-agent memory savings | headroom | no net-accounting published (H3/H4) | T4 — not proven |
| RTK "60–90% on common dev commands" | RTK README | vendor rtk gain counter (~4-chars/token, not Claude BPE); no whole-session telemetry; no independent benchmark | per-command best case on verbose commands |
| RTK "30-minute session −80%" (~118k→~23.9k) | RTK README | assumes a Bash-heavy command mix; not a measured whole-session distribution | per-command best case extrapolated |
| RTK "89% of CLI noise across 2,900+ commands"; "15,720 commands → 138M tokens" | rtk-ai.app/savings | all via rtk gain; vendor page | T4 |
| RTK user self-reports 83.7% / 79.3% / 89% | Hacker News | self-reported through RTK's own counter | T4 |
| RTK hook "raised Claude Code cost 18%" (issue #582); "#886 bypasses permission prompts" | RTK issue tracker | single-config reports, not a controlled study | cautionary; output-rewriting hooks have edges |
| RTK Cloud — "Free for open-source, Teams from $15/dev/month" team analytics | rtk-ai.app (2026-06-20) | announced/waitlist, not shipped; roadmap | the only one of the three with a stated monetization path |
| third-party RTK↔MCP bridge (mcpmarket.com) | search snippet | could not fetch (429); not in the official repo | not proven; official RTK is hook + CLI only |
RTK version: site/footer v0.42.4 vs README verification text rtk 0.28.2 | rtk-ai.app + repo | doc drift on a 212-release cadence | trust the higher/site number; treat docs as lagging |
| lean-ctx "up to 99% / 60–90% fewer tokens" | lean-ctx README/crate desc | per-read on code / cache-handle; prose/config compress <10%; GPT tokenizer; no independent benchmark | locally reproduced on code reads (96–99%); whole-bill correction applies |
| lean-ctx "86% on a 30-min session" | BENCHMARKS.md | code-read-heavy mix, o200k_base, not a measured whole-session distribution | per-session best case (RTK-"80%-session" category) |
| lean-ctx "single binary, no runtime dependencies" | README | the release binary is 64.7 MB and runs a daemon + dashboard + SQLite stores | one binary, but a heavyweight runtime — not RTK-class minimalism |
| lean-ctx "cl100k within ~3% of Claude's tokenizer" | core/tokens.rs | cl100k/o200k are GPT tokenizers, not Claude BPE; 3% is more optimistic than the dossier's Fable/Opus finding | treat percentages as directional |
lean-ctx "200+ releases" vs 30 git tag (default page) | README + gh api | tag pagination / release vs tag distinction unresolved | not confirmed; created 2026-03, fast cadence is plausible |
| lean-ctx Lean 4 proof crate covers load-bearing invariants | lean/ dir | proofs present but their coverage/significance not audited | not proven; the direction (formal verification) is novel regardless |
| lean-ctx Pro $9/mo · Team $18/seat/mo; "local free forever (CI-enforced)" | leanctx.com/pricing | shipped pricing page; "CI-enforced invariant" is a vendor claim | open-core, cleaner than RTK Cloud (never gates local) — but "forever" only time tests |
| caveman brevity "+26 points accuracy" (arXiv 2604.00025) | repo-cited paper | verified to exist, but solo/unreviewed, +26.3pp on a cherry-picked 7.7% subset, no Claude, no code | do not propagate; defensible brevity-helps-accuracy evidence is Chain-of-Draft (+4.1pp at −92% output, Claude 3.5 Sonnet) |
| SWE-Pruner arXiv 2601.16746; structural-graph 2603.27277 | file 54 ledger | arXiv IDs could not be independently confirmed | treat IDs as tentative; the GitHub repos are real |
Additional preserved detail (benchmarks, integrations, ecosystem stats)
Captured here so the hub loses none of the prior research's specifics.
Headroom accuracy benchmarks (vendor self-report, 100-sample tests). GSM8K 0.870 → 0.870 (±0.000); TruthfulQA 0.530 → 0.560 (+0.030); SQuAD v2 97% at 19% compression; BFCL tools 97% at 32% compression; HTML extraction F1 0.919 (recall 0.982) at 94.9% compression on a structured benchmark. The pattern is the dossier thesis restated by the vendor: accuracy is preserved at low compression on prose/QA, and high compression is only safe on highly-repetitive content.
Headroom v0.5.18 per-payload byte tests. build log (200 lines) 2,412 B → 148 B (~94%); grep results (150 hits) 2,624 B → 2,624 B (0%, pass-through); Python source (~480 lines) 2,958 B → 2,958 B (0%, "code passes through to preserve correctness").
Headroom cache-bust regression tests (the prefix-safety guards that make the live-zone design auditable): prefix_cache_benchmark.py, cache_bust_trace_report.py, synthetic_token_cache_bust_report.py, cache_validation_bundle.py.
RTK 14-platform integration, with the hook mechanism each uses. Claude Code (PreToolUse hook), Copilot VS Code (PreToolUse) / Copilot CLI (deny-with-suggestion), Cursor (preToolUse), Gemini CLI (BeforeTool), Codex (AGENTS.md + RTK.md instructions, no hook), Windsurf / Cline / Kilo / Antigravity (rule files), OpenCode / OpenClaw / Pi (TS plugins), Hermes (Python). Aider appears in blog write-ups but not in the README integration table.
Caveman-family and memory-competitor adoption (point-in-time; all PR-inflated in this niche — rank by evidence, not stars). caveman 74,446★ (2026-06-18; was 72,759★ on 2026-06-15); cavemem 521★; cavekit 1,014★; fff 8,351★. The cross-session-memory competitor thedotmack/claude-mem (81,976★ — larger than caveman) is what cavemem and headroom's memory layer compete against; none publishes injection-cost-vs-re-exploration-saved net accounting. Headroom grew 28,199★ (2026-06-15) → 33,359★ (2026-06-18). Broader-market compressors (microsoft/LLMLingua, CompactPrompt, Mem0, and the code-domain literature — SWE-Pruner 70.6%→72.0%, the Perplexity Paradox's 86.1%-NameError finding, etc.) are catalogued in the dossier's 03 and 54 chapters, which this hub references rather than duplicates.
lean-ctx architecture specifics (source-audited + locally built, v3.8.9, 2026-06-20). One Rust binary (~1,200 source files) that is shell hook + MCP server (77 tools, 5 unified) + HTTP/team server + daemon + proxy + dashboard. 10 read modes; 56 shell-pattern modules; tree-sitter for 18 languages. Compression core deterministic (core/compressor.rs entropy/attention/TF-IDF, core/signatures.rs, core/patterns/); embeddings opt-in (core/dense_backend.rs feature-gated, Local/qdrant backends, core/embeddings/download.rs); proxy prose rewrite opt-in (proxy/prose.rs) and cache-safe-by-design (proxy/cache_safety.rs — frozen-region [cached_prefix_len, boundary), measured ratio). Code intelligence: core/property_graph/ (imports/calls/exports/type_ref, weighted BFS), core/bm25_index.rs, core/hybrid_search.rs (RRF), LSP via lsp/ (rust-analyzer/typescript-language-server/pylsp/gopls). Memory: core/session.rs (CCP), core/knowledge.rs, episodic/procedural/prospective, core/context_os/ (SQLite-WAL bus, A2A HMAC transport). Verification: core/evidence_ledger.rs, core/context_proof.rs, 20 versioned contracts, a Lean 4 crate (lean/). Tokenizer core/tokens.rs (o200k/cl100k, GPT not Claude BPE). Bounce-netting core/bounce_tracker.rs → adjusted_total_saved(). Local build measured 64.7 MB release binary; cargo test --lib tokens 48/48 pass.
Full source ledger (all four tools)
Consolidated from the dossier chapters; vendor numbers are self-reported unless marked. Adoption stats and live numbers re-pulled 2026-06-18 (lean-ctx 2026-06-20); headroom/market literature swept 2026-06-15.
Caveman
- Repository + README (headline "~75%", per-task mean 65% / pooled 75.8%, "Answer concisely." baseline, "thinking… untouched", "cost savings a bonus", caveman-compress ~46%, cavecrew ~60%, caveman-code ~2× vs Codex, arXiv 2604.00025 "+26 points"):
JuliusBrussee/cavemanREADME.md. - Plugin internals (markdown
SKILL.mdregister-shift engine, no compression code;plugin.jsondeclares aSessionStartcaveman-activate.js+UserPromptSubmitcaveman-mode-tracker.jshook;bin/lib/settings.jsidempotent writes into~/.claude/settings.jsonwithhasCavemanHook/removeCavemanHooks;evals/measure.pytiktokeno200k_basevs "Answer concisely." over 10 prompts; issue #484): direct read of installedJuliusBrussee/cavemanv1.9.0(2026-06-18). - Local tokenizer measurements (ultra 58.5%; wenyan-full 56.6% token / 80.9% char; wenyan-ultra 74.5%; abbreviation/glyph/arrow costs; register ladder −54/−60/−67%): dossier 10 and 03 (
/tmp/ct.pycount_tokens, method shown there). - Family positioning (cavekit orchestrates / caveman compresses output / cavemem compresses memory): getcaveman.dev; cavemem README; independent ~4% whole-session estimate: mayhemcode.com (2026-04).
- Adoption (2026-06-18): 74,446★ / 166 watchers /
v1.9.0(gh api repos/JuliusBrussee/caveman).
Headroom
- Repository + README + docs ("60–95% fewer tokens, same answers"; library/proxy/MCP/agent-wrapper; CacheAligner verbatim "extracting dynamic content… prefix stays byte-identical… KV cache can reuse"; "96.2% total savings"):
chopratejas/headroom; headroom-docs.vercel.app/docs. - Source tree (the cache-safety machinery):
cache_stabilization/(anthropic_cache_control.rs,volatile_detector.rs,tool_def_normalize.rs,drift_detector.rs),compression/live_zone_anthropic.rs, the CCR store, the typed compressors (LogCompressor,CodeAwareCompressor,SearchCompressor,SmartCrusher,HTMLCompressor,IntelligentContext);cache_controlin 62 files (gh api search/code);benchmarks/prefix_cache_benchmark.pyetc. - Companion model:
chopratejas/kompress-baseon HuggingFace (transformer trained on agentic traces, auto-downloaded default text compressor). - Benchmarks (code search 17,765→1,408 = 92%; SRE 65,694→5,118 = 92%; triage 54,174→14,761 = 73%; exploration 78,502→41,254 = 47%; 6-type mix 23,921→8,110 = 66.1%; v0.5.18 build-log ~94%, grep/Python 0%; GSM8K/TruthfulQA/SQuAD/BFCL accuracy): headroom README +
docs/benchmarks.md. - Production telemetry (median 4.8% / P75 6.9% / mean 11.3% whole-session across 50k+ sessions / 250+ instances; "Short conversational exchanges (median 4.8%)"): headroom limitations/benchmarks pages.
- Latency telemetry (proxy P50 52 ms / P90 309 ms / P99 4,172 ms / mean 161 ms; internal pipeline 16.9 ms median; ContentRouter 11.7 ms = 91–98%; SmartCrusher 50.1 ms; TextCompressor 32.0 ms; v0.5.18): headroom-docs.vercel.app/docs/benchmarks (2026-06-18).
- Independent measurement: Miya-Gadget (2026-06-03) 59,742→31,358 = 47.5% whole-session (code 79.8%, JSON 59.2%, logs 31.0%, RAG/prose 0%), "95%… oversold"; HN "~50%". Press origin: theregister.com (2026-05-31).
- headroom-tracks-RTK corroboration (
tokens_saved_rtkdata plane + "RTK metrics + Rust observability"): headroom releasesv0.22.4(2026-06-01). - PyPI (190 releases, summary "Cut costs by 50-90%"): pypi.org/project/headroom-ai. Adoption (2026-06-18): 33,359★ / 111 watchers /
v0.26.0(gh api repos/chopratejas/headroom).
RTK
- Repository + README ("60–90% on common dev commands"; single Rust binary, zero deps; four strategies; PreToolUse auto-rewrite;
<10 ms; tee recovery;rtk init/gain/discover/session; 14-platform integration table; per-command + "30-min session" savings):github.com/rtk-ai/rtk(README ondevelop/master;main404s); official site rtk-ai.app. - Internals (six-phase PARSE→ROUTE→EXECUTE→FILTER→PRINT→TRACK; 12 strategies;
src/core/filter.rsNone/Minimal/Aggressive code filter across 8 languages; SQLite~/.local/share/rtk/history.dbwith ~4-chars/token heuristic; exit-code preservation; fail-safe fallback;-vvv; ~4.1 MB binary / ~5–15 ms per command; package-manager sniffing): RTKdocs/contributing/ARCHITECTURE.md(2026-06-18). - Vendor "measured" page (89% of CLI noise across 2,900+ commands; one dev 15,720 commands → 138M tokens; all via
rtk gain): rtk-ai.app/savings. - Most balanced third-party write-up (figures vendor-sourced; estimates only; Bash-calls-only; "could theoretically strip context the agent needed"): dev.to/arshtechpro.
- HN engagement (Show HN id 46974740; second thread id 47189599 = 18 pts / 3 comments; naming-collision criticism; self-reports 83.7% / 79.3% / 89% via RTK's counter): news.ycombinator.com + hn.algolia.com.
- Issue tracker cautions (#582 hook raised cost 18% in a config; #886 bypassed permission prompts): github.com/rtk-ai/rtk/issues.
- Adoption (2026-06-18): 63,608★ / 146 watchers / 3,913 forks / 1,254 issues / Apache-2.0 / created 2026-01-22 /
v0.42.4(gh api repos/rtk-ai/rtk).
lean-ctx
- Repository + README ("Context intelligence layer"; "60–90% fewer tokens, cached 99%"; 77 MCP tools; 10 read modes; 56/95+ shell patterns; CCP memory; property graph; hybrid RRF search; LSP refactor; multi-agent Context OS; signed savings ledger; "no telemetry by default"; 30+ agents):
github.com/yvgude/lean-ctxREADME.md, crate description (rust/Cargo.toml). - Internals (the three interception points; deterministic core vs opt-in embeddings/proxy; CFT Φ-function + knapsack compiler + context handles; bounce-netting; cache-safe proxy frozen-region; multi-tokenizer o200k/cl100k = GPT not Claude BPE):
ARCHITECTURE.md(1,192 lines, 5 mermaid diagrams),LEANCTX_FEATURE_CATALOG.md(SSOT), direct source read ofv3.8.9(modules enumerated in "Additional preserved detail" above). - Benchmark (vendor
lean-ctx benchmark report,o200k_base, competitor numbers are published figures not re-measured — self-favorable, same category as RTK's own counter):BENCHMARKS.md(map 97.7% / signatures 97.0% / session 86.4% on the lean-ctx repo). Locally reproduced this round (cargo build --release→ 64.7 MB binary;benchmark report .→ code 96–99%, prose/config 0.8–30%, map quality 77%;cargo test --lib tokens48/48 pass). - Pricing/commercial (Free local Apache-2.0 "forever", Pro $9/mo cloud sync, Team $18/seat/mo; "Free is a CI-enforced invariant"): leanctx.com/pricing, leanctx.com/compare.
- Adoption (2026-06-20): 2,800★ / 19 watchers / 278 forks / 13 open issues / Apache-2.0 / created 2026-03-23 /
v3.8.9(gh api repos/yvgude/lean-ctx). README claims "2,600+ stars / 200+ releases" — roughly consistent with the live pull (least-inflated of the four).
Cross-tool, stack, and literature
- Published RTK-vs-headroom head-to-head (one month production TS/Next.js, self-measured via each tool's counter: RTK 1,327,700,000 tok / headroom 189,014,601 tok at 96% prefix-cache-hit / combined 1,516,714,601; ~200–500-tok passthrough overhead): andrewpatterson.dev/posts/token-savings-rtk-headroom.
- Four-layer community stack model (CBM → context-mode → caveman → headroom/RTK; "complementary layers, not overlapping"; 47–92%; "30 min → 3+ hr"):
sgaabdu4/claude-code-tipsvia DeepWiki. - Independent caveats corroborating the RTK kills (no controlled baseline, no variance, Bash-bounded, "significantly below headline"): candido.ai/blog/claude-code-token-optimization; stack write-up paul-hackenberger.medium.com.
- Cache-economics + write-time-vs-whole-prompt classification: "Don't Break the Cache" arXiv 2601.06007; Claude 4.5 compression RCT arXiv 2603.23525; CompressionAttack arXiv 2510.22963. Write-time / code-domain compressors: Squeez arXiv 2604.04979; AgentDiet arXiv 2509.23586; SWEzze/OCD arXiv 2603.28119; SWE-Pruner arXiv 2601.16746 (ID tentative); LongCodeZip ASE 2025; Perplexity Paradox arXiv 2602.15843. Output brevity: Chain-of-Draft arXiv 2502.18600; TALE arXiv 2412.18547. Full literature ledger: dossier 54.
- jackin' hook/host-state hazard and the role-scoped registration pattern: architect code-intelligence tooling roadmap.
This page, with the rest of the hub, is the complete record. The dossier chapters retain their broader-topic context and point here for the per-tool detail. Back to the overview.