# 49 — Volume II: coverage-delta ledger, verdict delta, and corrections (https://jackin.tailrocks.com/research/token-optimization/49-extension-stacks-and-verdict/)



# 49 — Volume II: coverage-delta ledger, verdict delta, and corrections [#49--volume-ii-coverage-delta-ledger-verdict-delta-and-corrections]

The Volume II capstone: it collects the coverage-delta notes that
prove novelty against files 00–32, states whether Volume II moves Volume I's headline verdict (with
arithmetic), records cross-layer corrections and caveats, updates the composed stacks for the
subscriber metric, and lists the Volume II graveyard of claims killed or downgraded this run.

**TL;DR**

* **The 10x dollar verdict is unchanged: ≈2.5× defensible / ≈5–6.2× with validated routing / no true
  10× at equal quality.** No Volume II lever removes Volume I's two binding constraints (frontier-model
  thinking output; the cache-read floor of genuinely-used context). Multimodal adds a real but small
  dollar lever (vision-tier routing cuts 67% of *image* tokens, a minor share of most coding
  sessions); latency, governance, and portability change *which* choice is correct and *what is
  measurable*, not the dollar ceiling.
* **But for this operator the metric is wrong.** The local credential is Max (file 41); a subscriber's
  binding constraint is the **usage cap**, not dollars, so Volume II presents a **second cost model
  alongside** Volume I's: optimize **tasks-per-cap**, where the lever order re-sorts (prefix stability,
  context-window size, and request-volume discipline rise; subagent fan-out partially inverts; style
  compression matters even less). Below the cap, dollars are sunk; the dollar model applies only to the
  overage tail (API-rate credits) and to the off-cap headless/SDK lane.
* **A pricing reconciliation, not a multiplier change:** Volume I's dollar figures are Fable-5-priced
  ($10/$50); the operator's actual main model is **Opus 4.8*&#x2A; ($5/$25, identical tokenizer; measured
  465/560 calls), and Fable 5 leaves the subscription. So Volume I's absolute daily
  dollars are **\~2× high** for this operator — but ratios are price-invariant, so the multipliers and
  tier list are unaffected.
* **42 genuinely-new techniques** (files 41–47), each with a coverage-delta note proving absence from
  00–32, plus 8 new frontier ideas (48) — all with the full Volume I §10 record schema and validation
  protocols. The coverage-delta ledger is below.
* **Cross-layer corrections/caveats:** the server-cache-scope conflation was applied to `13`; the
  subagent-caching-default conflict (#29966 vs Volume I's measured subagent cache writes) remains
  version/path-dependent and must be audited in the operator's own JSONL before acting.

***

## Coverage-delta ledger (novelty proof vs 00–32) [#coverage-delta-ledger-novelty-proof-vs-0032]

Every Volume II technique was checked against the named Volume I file before claiming novelty.

| Technique                                                                                                                                                  | Vol I file checked                | Why genuinely new                                                                                                                            |
| ---------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| 41 Q1–Q7 (quota model, prefix-as-quota, window size, request-volume, 1h-in-allowance, overage decision, SDK-off-cap)                                       | 13 (caching), 01 (econ), README   | Vol I prices in $ only; "quota" named in 13, modeled 0× — the cap weighting, denominator, re-sort, and the 06-15 split are all new           |
| 42 M1–M6 (vision routing, downsample, text-over-screenshot, text-over-PDF, crop, lazy vision)                                                              | 12, 16, 03                        | Multimodal = 0 substantive hits in 00–32 ; image/PDF token costs unmeasured                                                                  |
| 43 L1–L6 (fast-mode purchase, optimizer latency tax, parallel-time, cache-on-both-axes, batch-time, quota-edge decision)                                   | 17:110, 18:132, 19:147, 01:52     | Latency mentioned in scatter; never built into a time-value decision model (v·t·s > Δ$)                                                      |
| 44 F1–F6 (workspace fleet sharing, excludeDynamicSections sized, cold-start pre-warm, subagent-caching audit, multi-tenant boundary, jackin' fleet policy) | 13 tech 7, 19:116-187, 17         | Vol I covers self-host fleet (19) + the SDK flag (13); hosted cross-container sharing, the measured dynamic-section size, and #29966 are new |
| 45 P1–P6 (portable core, keepalive borrow, 3-tier routing borrow, dedup borrow, per-tool budgets borrow, non-portable edge)                                | 14/15 availability labels, 18:196 | No portability matrix exists in 00–32; reverse-portability (borrow from other agents) is new                                                 |
| 46 FL1–FL5 (CAG-via-caching, /cd, no-compressor-security, KV-family verdict, Fable→Opus re-decision)                                                       | 03, 19, 20 K1/K2/K15              | KV-eviction family (SnapKV/H2O/PyramidKV/KVQuant), CAG, CompressionAttack, /cd, the Fable promo = 0 hits / post-freeze                       |
| 47 G1–G6 (real governors, max\_tokens replacement, online canary, guard tax, degrade-don't-die, recursion cap)                                             | 15:123-135, 31, 32:13             | Vol I has offline harness + max\_tokens-as-rail only; runtime budget governance + online drift detection are new                             |
| 48 V1–V8 (frontier)                                                                                                                                        | 20 K1–K16                         | Each maps to a Volume II blind spot; dedup notes vs every K-idea in 48                                                                       |

**Count: 42 new techniques in 41–47*&#x2A; (Q7+M6+L6+F6+P6+FL5+G6) **+ 8 frontier (48) = 50**, against the
brief's floors of ≥25 cataloged and ≥10 with the full record. **All 42 carry the full §10 record**
(Name, Layer, Mechanism, Expected savings, Evidence tier, Quality risk, Availability, Effort,
Composability, Validation protocol) plus a coverage-delta line.

## Verdict delta — does Volume II move the numbers? [#verdict-delta--does-volume-ii-move-the-numbers]

**On dollars: no.** Carrying the arithmetic:

* Volume I's binding constraints are structural: (1) frontier-model thinking bills as output and only
  effort/not-being-the-frontier-model touches it; (2) a cache-read floor of context the agent
  genuinely needs. &#x2A;*No Volume II lever removes either.** The KV-compression family that could attack
  the cache floor is self-host-only on hosted Claude (file 46) — same wall as Volume I's K1/K2.
* The new dollar levers are real but small: vision-tier routing saves 67% of *image* tokens (file 42),
  but images are a minor share of a typical coding session; text-over-PDF saves \~50% on *document*
  tokens, only when PDFs are in play; CAG-via-caching (FL1) converts retrieval into 0.1× reads, a
  context-architecture win already in Volume I's family. None compounds into a new multiplier.
* **So the headline stands after the independent correction: ≈2.5× defensible today, ≈5–6.2× if the Sonnet-main+advisor routing flip
  passes validation, no true 10× at provably equal quality.** Multimodal/latency/governance/portability
  change correctness and measurability, not the ceiling.

**On the metric: yes — and this is Volume II's real contribution.** For a subscriber (this operator is
Max), "$ per task" is the wrong denominator. The quota model (file 41):

* Below the cap, **dollars are sunk**; the objective is **tasks per 5-hour/weekly window**.
* The cap weights re-sort the levers: **prefix stability** and **context-window size** and
  **request-volume discipline** become the top levers (a cache *miss* is a 1.25–2× write against the
  cap; subagent fan-out is \~61% of calls); **style/register compression matters even less** than in the
  dollar model (visible output is a sliver of cap-weighted tokens).
* The cap weighting itself is **officially unpublished**; community triangulation puts cache-read at
  \~0.1× (T3, file 41), and the **token denominator is opaque** (bounded INCOMPLETE). So "tasks per cap"
  is *directional* without the per-account prober (frontier V2).

**A pricing reconciliation (not a multiplier change):** Volume I modeled the day at Fable 5 prices
($17–22/day). The operator runs **Opus 4.8** (half the per-token price, identical tokenizer family),
and Fable 5 leaves the subscripti. So Volume I's *absolute* dollar figures are \~2×
high for this operator going forward; **ratios, multipliers, and the tier list are unchanged** (they
are price-invariant) and all of Volume I's tokenizer-arbitrage and the file-42 image math transfer
directly (Opus 4.8 ≡ Fable 5 tokenizer).

## Tier-list delta [#tier-list-delta]

Volume I's S/A/B/C/F tiers stand. Volume II adds, scored by the same `$saving × confidence ÷ effort`
on the dollar axis, and re-weighted for the quota axis where noted:

* **New S-tier (negative-cost or large, low-effort):** vision-tier routing for screenshot/PDF work
  (42 M1, −67% image tokens, trivial); text-over-PDF / text-over-screenshot (42 M3/M4, negative-cost);
  `/cd` cache preservation (46 FL2, zero-effort negative-cost); **prefix-stability-as-quota-lever** (41
  Q2 — promoted to top on the quota axis); the real governors (47 G1, loss-avoidance).
* **New A-tier:** CAG warm-repo cached prefix (46 FL1 / 48 V6); time-value fast-mode (43 L1/V5);
  fleet workspace cache sharing (44 F1–F3); online-canary-gated compression (48 V7); quota-window
  scheduling (48 V1) for capped subscribers.
* **New B/C-tier:** portable-policy compiler (48 V8); the borrow-ins (45 P2–P5); cross-provider
  effort/cache levers (45) where the operator might switch agents.
* **New F-tier (killed/blocked — see graveyard):** the hosted-Claude KV-compression family (46, $0);
  prompt-compressors in the hot path (now also a security kill, 46 FL3 / 47); `max_tokens` as a spend
  governor (re-killed, 47 G2); lossless at-rest compression as a "token" saver (46, category error).

## Composed-stack updates [#composed-stack-updates]

**Conservative (do tomorrow, riskless) — adds:** vision-tier routing + text-over-pixels/PDF for any
visual work; `/cd` instead of restart-in-new-dir; treat prefix stability as the #1 *quota* lever; set
a Claude Code `/usage-credits` monthly cap and (for fleets) a workspace rate limit; re-decide Fable→
Opus 4.8 before 06-23.

**Aggressive (with validation) — adds:** the CAG warm-repo cached prefix (preload the stable repo core
once); fleet workspace cache sharing + cold-start pre-warm (jackin'); time-value-driven fast mode for
interactive sessions; an online-quality canary so aggressive register/effort compression runs behind a
live safety net.

**Unbelievable (chase the ceiling) — adds:** the quota-window scheduler + per-account quota-denominator
prober (close the cap INCOMPLETE); the cross-provider portable-policy compiler; the jackin'-baked fleet
cache + governance pack. &#x2A;*Binding constraint, restated:** on dollars, still the frontier-thinking +
cache-read floor (no 10×). On quota, the binding constraint is the **unpublished denominator** — you
cannot prove "10× more tasks per cap" without measuring the cap, which only a header-reading proxy can
do per-account.

**For a subscriber specifically**, the stack re-orders: cap-headroom levers (prefix stability, smaller
window, request-volume discipline, 1h-in-allowance, headless-off-cap placement) come first; dollar
levers matter only on the overage tail.

## Cross-layer caveats [#cross-layer-caveats]

Two cache-layer facts that span Volume I and the fleet analysis:

1. **Server cache scope vs local file cache (13 tech 7 + surprising-findings) — applied.** Original Volume I text stated "your
   git state is in the cache key… worktrees never share," citing the prompt-caching docs, implying the
   *server* prompt cache is keyed by machine/directory/git-snapshot. File 44's sources show the
   **server** prompt cache is **workspace-scoped** with **no** machine/dir/worktree
   key; the git-snapshot/worktree rules describe Claude Code's **local file cache** (a different layer,
   GitHub #17531). Practical effect is favorable: hosted fleets *can* share a prefix across machines/dirs.
2. **Subagent caching default (13 tech 2 vs GitHub #29966).** Volume I measured subagents *writing* 5m
   cache (1,128/1,128 calls). #29966 (T3, one session, Claude Code 2.1.63 / SDK 0.2.63) reports
   Agent-tool subagents with `enablePromptCaching` hardcoded **false** (caching off). These may both be
   true at different versions/spawn paths (cavecrew/Task-tool subagents vs SDK Agent-tool subagents).
   **Recorded as version/path-dependent; operator should audit their own subagent JSONL** (file 44 F4)
   rather than assume either.

No other Volume I claim was contradicted; the min-cacheable-prefix values (13 tech 11), tokenizer-family
equality, and cache multipliers were all re-verified live and **stand unchanged**.

## Volume II graveyard (killed or downgraded this run) [#volume-ii-graveyard-killed-or-downgraded-this-run]

| Claim                                                                                  | Verdict                                                                                                                           | Where |
| -------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------- | ----- |
| "KV-cache compression (SnapKV/H2O/PyramidKV/KVQuant) will cut your hosted-Claude bill" | **KILLED** — all self-host-only; $0 on a hosted API; extends Vol I's K1/K2 block                                                  | 46    |
| "cache\_read counts at 1× against the subscription cap" / "the cache double-counts"    | **KILLED** — \~0.1× (T3); the 1× double-count was a LiteLLM/Bedrock proxy bug, fixed; direct Anthropic auth does not double-count | 41    |
| "Anthropic's workspace dollar spend limit hard-stops spend"                            | **KILLED** — alert-only; hard block is via rate limits / `/usage-credits` / gateways                                              | 47    |
| "Set `max_tokens` low to cap spend"                                                    | **RE-KILLED** — truncates `tool_use`, bills the attempt + a higher-cap retry; use `model_context_window_exceeded` + a budget      | 47    |
| "The multi-agent 'up to 90% faster' result applies to coding agents"                   | **KILLED** — research-only; Anthropic says coding is a poor fit; naive parallelism risks pooled-limit backoff                     | 43    |
| "Lossless prompt compression (LoPace) cuts your token bill"                            | **KILLED (category error)** — compresses bytes at rest, not tokens sent                                                           | 46    |
| "CompactPrompt saves \~60% for coding agents"                                          | **DOWNGRADED to T4 folklore** — single vendor guide, lossy NL; LLMLingua caveats apply                                            | 46    |
| "Keepalive pingers save the subscription quota on the Claude Code main loop"           | **MOOT (re-confirmed)** — main loop is 1h-TTL in-allowance; Cherny: "a small win"; overage reverts to 5m                          | 41    |
| "A screenshot is a cheap way to show the agent code/logs"                              | **KILLED** — a screenshot is 2–6× the text it shows and caps at one frame; text wins for textual content                          | 42    |
| "Prompt-compression proxies are merely a cost trade"                                   | **DOWNGRADED** — also a security attack surface (CompressionAttack, ≤80% ASR)                                                     | 46    |

## What survived the adversarial pass [#what-survived-the-adversarial-pass]

Every Volume II headline number was either documented-and-locally-reproduced, honestly tiered, or
explicitly INCOMPLETE. The two most novel/load-bearing numbers were re-attacked and held:

* **Image-cap divergence is \~3.0–3.1×:** confirmed content-independent — a max-entropy noise image
  at 2560×1440 returned the same capped family split as the gradient. The formula and routing lever
  stand; exact capped counts vary by wrapper/envelope, so use the band rather than 4,784/1,568 as
  measured absolutes.
* **The PDF tax \~2×:** confirmed across content — code-like text gave 1.90× (Opus) / 2.11× (Sonnet),
  bracketing the fox-text 1.98× / 2.30×. Holds.

The community-sourced and unpublished items (cap weight \~0.1×, the cap denominator) are labeled T3 /
INCOMPLETE and not promoted beyond their evidence.

***

The Volume II self-audit against the definition of done is appended to [the research index](./) under
`## Volume II — Extension`, alongside the index, headline numbers, blind-spot-map summary, and the
Volume II Assumptions section.
