jackin'
ResearchToken Optimization Research

49 — Volume II: coverage-delta ledger, verdict delta, and corrections

49 — Volume II: coverage-delta ledger, verdict delta, and corrections

The Volume II capstone: it collects the coverage-delta notes that prove novelty against files 00–32, states whether Volume II moves Volume I's headline verdict (with arithmetic), records cross-layer corrections and caveats, updates the composed stacks for the subscriber metric, and lists the Volume II graveyard of claims killed or downgraded this run.

TL;DR

  • The 10x dollar verdict is unchanged: ≈2.5× defensible / ≈5–6.2× with validated routing / no true 10× at equal quality. No Volume II lever removes Volume I's two binding constraints (frontier-model thinking output; the cache-read floor of genuinely-used context). Multimodal adds a real but small dollar lever (vision-tier routing cuts 67% of image tokens, a minor share of most coding sessions); latency, governance, and portability change which choice is correct and what is measurable, not the dollar ceiling.
  • But for this operator the metric is wrong. The local credential is Max (file 41); a subscriber's binding constraint is the usage cap, not dollars, so Volume II presents a second cost model alongside Volume I's: optimize tasks-per-cap, where the lever order re-sorts (prefix stability, context-window size, and request-volume discipline rise; subagent fan-out partially inverts; style compression matters even less). Below the cap, dollars are sunk; the dollar model applies only to the overage tail (API-rate credits) and to the off-cap headless/SDK lane.
  • A pricing reconciliation, not a multiplier change: Volume I's dollar figures are Fable-5-priced ($10/$50); the operator's actual main model is Opus 4.8 ($5/$25, identical tokenizer; measured 465/560 calls), and Fable 5 leaves the subscription. So Volume I's absolute daily dollars are ~2× high for this operator — but ratios are price-invariant, so the multipliers and tier list are unaffected.
  • 42 genuinely-new techniques (files 41–47), each with a coverage-delta note proving absence from 00–32, plus 8 new frontier ideas (48) — all with the full Volume I §10 record schema and validation protocols. The coverage-delta ledger is below.
  • Cross-layer corrections/caveats: the server-cache-scope conflation was applied to 13; the subagent-caching-default conflict (#29966 vs Volume I's measured subagent cache writes) remains version/path-dependent and must be audited in the operator's own JSONL before acting.

Coverage-delta ledger (novelty proof vs 00–32)

Every Volume II technique was checked against the named Volume I file before claiming novelty.

TechniqueVol I file checkedWhy genuinely new
41 Q1–Q7 (quota model, prefix-as-quota, window size, request-volume, 1h-in-allowance, overage decision, SDK-off-cap)13 (caching), 01 (econ), READMEVol I prices in $ only; "quota" named in 13, modeled 0× — the cap weighting, denominator, re-sort, and the 06-15 split are all new
42 M1–M6 (vision routing, downsample, text-over-screenshot, text-over-PDF, crop, lazy vision)12, 16, 03Multimodal = 0 substantive hits in 00–32 ; image/PDF token costs unmeasured
43 L1–L6 (fast-mode purchase, optimizer latency tax, parallel-time, cache-on-both-axes, batch-time, quota-edge decision)17:110, 18:132, 19:147, 01:52Latency mentioned in scatter; never built into a time-value decision model (v·t·s > Δ$)
44 F1–F6 (workspace fleet sharing, excludeDynamicSections sized, cold-start pre-warm, subagent-caching audit, multi-tenant boundary, jackin' fleet policy)13 tech 7, 19:116-187, 17Vol I covers self-host fleet (19) + the SDK flag (13); hosted cross-container sharing, the measured dynamic-section size, and #29966 are new
45 P1–P6 (portable core, keepalive borrow, 3-tier routing borrow, dedup borrow, per-tool budgets borrow, non-portable edge)14/15 availability labels, 18:196No portability matrix exists in 00–32; reverse-portability (borrow from other agents) is new
46 FL1–FL5 (CAG-via-caching, /cd, no-compressor-security, KV-family verdict, Fable→Opus re-decision)03, 19, 20 K1/K2/K15KV-eviction family (SnapKV/H2O/PyramidKV/KVQuant), CAG, CompressionAttack, /cd, the Fable promo = 0 hits / post-freeze
47 G1–G6 (real governors, max_tokens replacement, online canary, guard tax, degrade-don't-die, recursion cap)15:123-135, 31, 32:13Vol I has offline harness + max_tokens-as-rail only; runtime budget governance + online drift detection are new
48 V1–V8 (frontier)20 K1–K16Each maps to a Volume II blind spot; dedup notes vs every K-idea in 48

Count: 42 new techniques in 41–47 (Q7+M6+L6+F6+P6+FL5+G6) + 8 frontier (48) = 50, against the brief's floors of ≥25 cataloged and ≥10 with the full record. All 42 carry the full §10 record (Name, Layer, Mechanism, Expected savings, Evidence tier, Quality risk, Availability, Effort, Composability, Validation protocol) plus a coverage-delta line.

Verdict delta — does Volume II move the numbers?

On dollars: no. Carrying the arithmetic:

  • Volume I's binding constraints are structural: (1) frontier-model thinking bills as output and only effort/not-being-the-frontier-model touches it; (2) a cache-read floor of context the agent genuinely needs. No Volume II lever removes either. The KV-compression family that could attack the cache floor is self-host-only on hosted Claude (file 46) — same wall as Volume I's K1/K2.
  • The new dollar levers are real but small: vision-tier routing saves 67% of image tokens (file 42), but images are a minor share of a typical coding session; text-over-PDF saves ~50% on document tokens, only when PDFs are in play; CAG-via-caching (FL1) converts retrieval into 0.1× reads, a context-architecture win already in Volume I's family. None compounds into a new multiplier.
  • So the headline stands after the independent correction: ≈2.5× defensible today, ≈5–6.2× if the Sonnet-main+advisor routing flip passes validation, no true 10× at provably equal quality. Multimodal/latency/governance/portability change correctness and measurability, not the ceiling.

On the metric: yes — and this is Volume II's real contribution. For a subscriber (this operator is Max), "$ per task" is the wrong denominator. The quota model (file 41):

  • Below the cap, dollars are sunk; the objective is tasks per 5-hour/weekly window.
  • The cap weights re-sort the levers: prefix stability and context-window size and request-volume discipline become the top levers (a cache miss is a 1.25–2× write against the cap; subagent fan-out is ~61% of calls); style/register compression matters even less than in the dollar model (visible output is a sliver of cap-weighted tokens).
  • The cap weighting itself is officially unpublished; community triangulation puts cache-read at ~0.1× (T3, file 41), and the token denominator is opaque (bounded INCOMPLETE). So "tasks per cap" is directional without the per-account prober (frontier V2).

A pricing reconciliation (not a multiplier change): Volume I modeled the day at Fable 5 prices ($17–22/day). The operator runs Opus 4.8 (half the per-token price, identical tokenizer family), and Fable 5 leaves the subscripti. So Volume I's absolute dollar figures are ~2× high for this operator going forward; ratios, multipliers, and the tier list are unchanged (they are price-invariant) and all of Volume I's tokenizer-arbitrage and the file-42 image math transfer directly (Opus 4.8 ≡ Fable 5 tokenizer).

Tier-list delta

Volume I's S/A/B/C/F tiers stand. Volume II adds, scored by the same $saving × confidence ÷ effort on the dollar axis, and re-weighted for the quota axis where noted:

  • New S-tier (negative-cost or large, low-effort): vision-tier routing for screenshot/PDF work (42 M1, −67% image tokens, trivial); text-over-PDF / text-over-screenshot (42 M3/M4, negative-cost); /cd cache preservation (46 FL2, zero-effort negative-cost); prefix-stability-as-quota-lever (41 Q2 — promoted to top on the quota axis); the real governors (47 G1, loss-avoidance).
  • New A-tier: CAG warm-repo cached prefix (46 FL1 / 48 V6); time-value fast-mode (43 L1/V5); fleet workspace cache sharing (44 F1–F3); online-canary-gated compression (48 V7); quota-window scheduling (48 V1) for capped subscribers.
  • New B/C-tier: portable-policy compiler (48 V8); the borrow-ins (45 P2–P5); cross-provider effort/cache levers (45) where the operator might switch agents.
  • New F-tier (killed/blocked — see graveyard): the hosted-Claude KV-compression family (46, $0); prompt-compressors in the hot path (now also a security kill, 46 FL3 / 47); max_tokens as a spend governor (re-killed, 47 G2); lossless at-rest compression as a "token" saver (46, category error).

Composed-stack updates

Conservative (do tomorrow, riskless) — adds: vision-tier routing + text-over-pixels/PDF for any visual work; /cd instead of restart-in-new-dir; treat prefix stability as the #1 quota lever; set a Claude Code /usage-credits monthly cap and (for fleets) a workspace rate limit; re-decide Fable→ Opus 4.8 before 06-23.

Aggressive (with validation) — adds: the CAG warm-repo cached prefix (preload the stable repo core once); fleet workspace cache sharing + cold-start pre-warm (jackin'); time-value-driven fast mode for interactive sessions; an online-quality canary so aggressive register/effort compression runs behind a live safety net.

Unbelievable (chase the ceiling) — adds: the quota-window scheduler + per-account quota-denominator prober (close the cap INCOMPLETE); the cross-provider portable-policy compiler; the jackin'-baked fleet cache + governance pack. Binding constraint, restated: on dollars, still the frontier-thinking + cache-read floor (no 10×). On quota, the binding constraint is the unpublished denominator — you cannot prove "10× more tasks per cap" without measuring the cap, which only a header-reading proxy can do per-account.

For a subscriber specifically, the stack re-orders: cap-headroom levers (prefix stability, smaller window, request-volume discipline, 1h-in-allowance, headless-off-cap placement) come first; dollar levers matter only on the overage tail.

Cross-layer caveats

Two cache-layer facts that span Volume I and the fleet analysis:

  1. Server cache scope vs local file cache (13 tech 7 + surprising-findings) — applied. Original Volume I text stated "your git state is in the cache key… worktrees never share," citing the prompt-caching docs, implying the server prompt cache is keyed by machine/directory/git-snapshot. File 44's sources show the server prompt cache is workspace-scoped with no machine/dir/worktree key; the git-snapshot/worktree rules describe Claude Code's local file cache (a different layer, GitHub #17531). Practical effect is favorable: hosted fleets can share a prefix across machines/dirs.
  2. Subagent caching default (13 tech 2 vs GitHub #29966). Volume I measured subagents writing 5m cache (1,128/1,128 calls). #29966 (T3, one session, Claude Code 2.1.63 / SDK 0.2.63) reports Agent-tool subagents with enablePromptCaching hardcoded false (caching off). These may both be true at different versions/spawn paths (cavecrew/Task-tool subagents vs SDK Agent-tool subagents). Recorded as version/path-dependent; operator should audit their own subagent JSONL (file 44 F4) rather than assume either.

No other Volume I claim was contradicted; the min-cacheable-prefix values (13 tech 11), tokenizer-family equality, and cache multipliers were all re-verified live and stand unchanged.

Volume II graveyard (killed or downgraded this run)

ClaimVerdictWhere
"KV-cache compression (SnapKV/H2O/PyramidKV/KVQuant) will cut your hosted-Claude bill"KILLED — all self-host-only; $0 on a hosted API; extends Vol I's K1/K2 block46
"cache_read counts at 1× against the subscription cap" / "the cache double-counts"KILLED — ~0.1× (T3); the 1× double-count was a LiteLLM/Bedrock proxy bug, fixed; direct Anthropic auth does not double-count41
"Anthropic's workspace dollar spend limit hard-stops spend"KILLED — alert-only; hard block is via rate limits / /usage-credits / gateways47
"Set max_tokens low to cap spend"RE-KILLED — truncates tool_use, bills the attempt + a higher-cap retry; use model_context_window_exceeded + a budget47
"The multi-agent 'up to 90% faster' result applies to coding agents"KILLED — research-only; Anthropic says coding is a poor fit; naive parallelism risks pooled-limit backoff43
"Lossless prompt compression (LoPace) cuts your token bill"KILLED (category error) — compresses bytes at rest, not tokens sent46
"CompactPrompt saves ~60% for coding agents"DOWNGRADED to T4 folklore — single vendor guide, lossy NL; LLMLingua caveats apply46
"Keepalive pingers save the subscription quota on the Claude Code main loop"MOOT (re-confirmed) — main loop is 1h-TTL in-allowance; Cherny: "a small win"; overage reverts to 5m41
"A screenshot is a cheap way to show the agent code/logs"KILLED — a screenshot is 2–6× the text it shows and caps at one frame; text wins for textual content42
"Prompt-compression proxies are merely a cost trade"DOWNGRADED — also a security attack surface (CompressionAttack, ≤80% ASR)46

What survived the adversarial pass

Every Volume II headline number was either documented-and-locally-reproduced, honestly tiered, or explicitly INCOMPLETE. The two most novel/load-bearing numbers were re-attacked and held:

  • Image-cap divergence is ~3.0–3.1×: confirmed content-independent — a max-entropy noise image at 2560×1440 returned the same capped family split as the gradient. The formula and routing lever stand; exact capped counts vary by wrapper/envelope, so use the band rather than 4,784/1,568 as measured absolutes.
  • The PDF tax ~2×: confirmed across content — code-like text gave 1.90× (Opus) / 2.11× (Sonnet), bracketing the fox-text 1.98× / 2.30×. Holds.

The community-sourced and unpublished items (cap weight ~0.1×, the cap denominator) are labeled T3 / INCOMPLETE and not promoted beyond their evidence.


The Volume II self-audit against the definition of done is appended to the research index under ## Volume II — Extension, alongside the index, headline numbers, blind-spot-map summary, and the Volume II Assumptions section.

On this page