49 — Volume II: coverage-delta ledger, verdict delta, and corrections
49 — Volume II: coverage-delta ledger, verdict delta, and corrections
The Volume II capstone: it collects the coverage-delta notes that prove novelty against files 00–32, states whether Volume II moves Volume I's headline verdict (with arithmetic), records cross-layer corrections and caveats, updates the composed stacks for the subscriber metric, and lists the Volume II graveyard of claims killed or downgraded this run.
TL;DR
- The 10x dollar verdict is unchanged: ≈2.5× defensible / ≈5–6.2× with validated routing / no true 10× at equal quality. No Volume II lever removes Volume I's two binding constraints (frontier-model thinking output; the cache-read floor of genuinely-used context). Multimodal adds a real but small dollar lever (vision-tier routing cuts 67% of image tokens, a minor share of most coding sessions); latency, governance, and portability change which choice is correct and what is measurable, not the dollar ceiling.
- But for this operator the metric is wrong. The local credential is Max (file 41); a subscriber's binding constraint is the usage cap, not dollars, so Volume II presents a second cost model alongside Volume I's: optimize tasks-per-cap, where the lever order re-sorts (prefix stability, context-window size, and request-volume discipline rise; subagent fan-out partially inverts; style compression matters even less). Below the cap, dollars are sunk; the dollar model applies only to the overage tail (API-rate credits) and to the off-cap headless/SDK lane.
- A pricing reconciliation, not a multiplier change: Volume I's dollar figures are Fable-5-priced ($10/$50); the operator's actual main model is Opus 4.8 ($5/$25, identical tokenizer; measured 465/560 calls), and Fable 5 leaves the subscription. So Volume I's absolute daily dollars are ~2× high for this operator — but ratios are price-invariant, so the multipliers and tier list are unaffected.
- 42 genuinely-new techniques (files 41–47), each with a coverage-delta note proving absence from 00–32, plus 8 new frontier ideas (48) — all with the full Volume I §10 record schema and validation protocols. The coverage-delta ledger is below.
- Cross-layer corrections/caveats: the server-cache-scope conflation was applied to
13; the subagent-caching-default conflict (#29966 vs Volume I's measured subagent cache writes) remains version/path-dependent and must be audited in the operator's own JSONL before acting.
Coverage-delta ledger (novelty proof vs 00–32)
Every Volume II technique was checked against the named Volume I file before claiming novelty.
| Technique | Vol I file checked | Why genuinely new |
|---|---|---|
| 41 Q1–Q7 (quota model, prefix-as-quota, window size, request-volume, 1h-in-allowance, overage decision, SDK-off-cap) | 13 (caching), 01 (econ), README | Vol I prices in $ only; "quota" named in 13, modeled 0× — the cap weighting, denominator, re-sort, and the 06-15 split are all new |
| 42 M1–M6 (vision routing, downsample, text-over-screenshot, text-over-PDF, crop, lazy vision) | 12, 16, 03 | Multimodal = 0 substantive hits in 00–32 ; image/PDF token costs unmeasured |
| 43 L1–L6 (fast-mode purchase, optimizer latency tax, parallel-time, cache-on-both-axes, batch-time, quota-edge decision) | 17:110, 18:132, 19:147, 01:52 | Latency mentioned in scatter; never built into a time-value decision model (v·t·s > Δ$) |
| 44 F1–F6 (workspace fleet sharing, excludeDynamicSections sized, cold-start pre-warm, subagent-caching audit, multi-tenant boundary, jackin' fleet policy) | 13 tech 7, 19:116-187, 17 | Vol I covers self-host fleet (19) + the SDK flag (13); hosted cross-container sharing, the measured dynamic-section size, and #29966 are new |
| 45 P1–P6 (portable core, keepalive borrow, 3-tier routing borrow, dedup borrow, per-tool budgets borrow, non-portable edge) | 14/15 availability labels, 18:196 | No portability matrix exists in 00–32; reverse-portability (borrow from other agents) is new |
| 46 FL1–FL5 (CAG-via-caching, /cd, no-compressor-security, KV-family verdict, Fable→Opus re-decision) | 03, 19, 20 K1/K2/K15 | KV-eviction family (SnapKV/H2O/PyramidKV/KVQuant), CAG, CompressionAttack, /cd, the Fable promo = 0 hits / post-freeze |
| 47 G1–G6 (real governors, max_tokens replacement, online canary, guard tax, degrade-don't-die, recursion cap) | 15:123-135, 31, 32:13 | Vol I has offline harness + max_tokens-as-rail only; runtime budget governance + online drift detection are new |
| 48 V1–V8 (frontier) | 20 K1–K16 | Each maps to a Volume II blind spot; dedup notes vs every K-idea in 48 |
Count: 42 new techniques in 41–47 (Q7+M6+L6+F6+P6+FL5+G6) + 8 frontier (48) = 50, against the brief's floors of ≥25 cataloged and ≥10 with the full record. All 42 carry the full §10 record (Name, Layer, Mechanism, Expected savings, Evidence tier, Quality risk, Availability, Effort, Composability, Validation protocol) plus a coverage-delta line.
Verdict delta — does Volume II move the numbers?
On dollars: no. Carrying the arithmetic:
- Volume I's binding constraints are structural: (1) frontier-model thinking bills as output and only effort/not-being-the-frontier-model touches it; (2) a cache-read floor of context the agent genuinely needs. No Volume II lever removes either. The KV-compression family that could attack the cache floor is self-host-only on hosted Claude (file 46) — same wall as Volume I's K1/K2.
- The new dollar levers are real but small: vision-tier routing saves 67% of image tokens (file 42), but images are a minor share of a typical coding session; text-over-PDF saves ~50% on document tokens, only when PDFs are in play; CAG-via-caching (FL1) converts retrieval into 0.1× reads, a context-architecture win already in Volume I's family. None compounds into a new multiplier.
- So the headline stands after the independent correction: ≈2.5× defensible today, ≈5–6.2× if the Sonnet-main+advisor routing flip passes validation, no true 10× at provably equal quality. Multimodal/latency/governance/portability change correctness and measurability, not the ceiling.
On the metric: yes — and this is Volume II's real contribution. For a subscriber (this operator is Max), "$ per task" is the wrong denominator. The quota model (file 41):
- Below the cap, dollars are sunk; the objective is tasks per 5-hour/weekly window.
- The cap weights re-sort the levers: prefix stability and context-window size and request-volume discipline become the top levers (a cache miss is a 1.25–2× write against the cap; subagent fan-out is ~61% of calls); style/register compression matters even less than in the dollar model (visible output is a sliver of cap-weighted tokens).
- The cap weighting itself is officially unpublished; community triangulation puts cache-read at ~0.1× (T3, file 41), and the token denominator is opaque (bounded INCOMPLETE). So "tasks per cap" is directional without the per-account prober (frontier V2).
A pricing reconciliation (not a multiplier change): Volume I modeled the day at Fable 5 prices ($17–22/day). The operator runs Opus 4.8 (half the per-token price, identical tokenizer family), and Fable 5 leaves the subscripti. So Volume I's absolute dollar figures are ~2× high for this operator going forward; ratios, multipliers, and the tier list are unchanged (they are price-invariant) and all of Volume I's tokenizer-arbitrage and the file-42 image math transfer directly (Opus 4.8 ≡ Fable 5 tokenizer).
Tier-list delta
Volume I's S/A/B/C/F tiers stand. Volume II adds, scored by the same $saving × confidence ÷ effort
on the dollar axis, and re-weighted for the quota axis where noted:
- New S-tier (negative-cost or large, low-effort): vision-tier routing for screenshot/PDF work
(42 M1, −67% image tokens, trivial); text-over-PDF / text-over-screenshot (42 M3/M4, negative-cost);
/cdcache preservation (46 FL2, zero-effort negative-cost); prefix-stability-as-quota-lever (41 Q2 — promoted to top on the quota axis); the real governors (47 G1, loss-avoidance). - New A-tier: CAG warm-repo cached prefix (46 FL1 / 48 V6); time-value fast-mode (43 L1/V5); fleet workspace cache sharing (44 F1–F3); online-canary-gated compression (48 V7); quota-window scheduling (48 V1) for capped subscribers.
- New B/C-tier: portable-policy compiler (48 V8); the borrow-ins (45 P2–P5); cross-provider effort/cache levers (45) where the operator might switch agents.
- New F-tier (killed/blocked — see graveyard): the hosted-Claude KV-compression family (46, $0);
prompt-compressors in the hot path (now also a security kill, 46 FL3 / 47);
max_tokensas a spend governor (re-killed, 47 G2); lossless at-rest compression as a "token" saver (46, category error).
Composed-stack updates
Conservative (do tomorrow, riskless) — adds: vision-tier routing + text-over-pixels/PDF for any
visual work; /cd instead of restart-in-new-dir; treat prefix stability as the #1 quota lever; set
a Claude Code /usage-credits monthly cap and (for fleets) a workspace rate limit; re-decide Fable→
Opus 4.8 before 06-23.
Aggressive (with validation) — adds: the CAG warm-repo cached prefix (preload the stable repo core once); fleet workspace cache sharing + cold-start pre-warm (jackin'); time-value-driven fast mode for interactive sessions; an online-quality canary so aggressive register/effort compression runs behind a live safety net.
Unbelievable (chase the ceiling) — adds: the quota-window scheduler + per-account quota-denominator prober (close the cap INCOMPLETE); the cross-provider portable-policy compiler; the jackin'-baked fleet cache + governance pack. Binding constraint, restated: on dollars, still the frontier-thinking + cache-read floor (no 10×). On quota, the binding constraint is the unpublished denominator — you cannot prove "10× more tasks per cap" without measuring the cap, which only a header-reading proxy can do per-account.
For a subscriber specifically, the stack re-orders: cap-headroom levers (prefix stability, smaller window, request-volume discipline, 1h-in-allowance, headless-off-cap placement) come first; dollar levers matter only on the overage tail.
Cross-layer caveats
Two cache-layer facts that span Volume I and the fleet analysis:
- Server cache scope vs local file cache (13 tech 7 + surprising-findings) — applied. Original Volume I text stated "your git state is in the cache key… worktrees never share," citing the prompt-caching docs, implying the server prompt cache is keyed by machine/directory/git-snapshot. File 44's sources show the server prompt cache is workspace-scoped with no machine/dir/worktree key; the git-snapshot/worktree rules describe Claude Code's local file cache (a different layer, GitHub #17531). Practical effect is favorable: hosted fleets can share a prefix across machines/dirs.
- Subagent caching default (13 tech 2 vs GitHub #29966). Volume I measured subagents writing 5m
cache (1,128/1,128 calls). #29966 (T3, one session, Claude Code 2.1.63 / SDK 0.2.63) reports
Agent-tool subagents with
enablePromptCachinghardcoded false (caching off). These may both be true at different versions/spawn paths (cavecrew/Task-tool subagents vs SDK Agent-tool subagents). Recorded as version/path-dependent; operator should audit their own subagent JSONL (file 44 F4) rather than assume either.
No other Volume I claim was contradicted; the min-cacheable-prefix values (13 tech 11), tokenizer-family equality, and cache multipliers were all re-verified live and stand unchanged.
Volume II graveyard (killed or downgraded this run)
| Claim | Verdict | Where |
|---|---|---|
| "KV-cache compression (SnapKV/H2O/PyramidKV/KVQuant) will cut your hosted-Claude bill" | KILLED — all self-host-only; $0 on a hosted API; extends Vol I's K1/K2 block | 46 |
| "cache_read counts at 1× against the subscription cap" / "the cache double-counts" | KILLED — ~0.1× (T3); the 1× double-count was a LiteLLM/Bedrock proxy bug, fixed; direct Anthropic auth does not double-count | 41 |
| "Anthropic's workspace dollar spend limit hard-stops spend" | KILLED — alert-only; hard block is via rate limits / /usage-credits / gateways | 47 |
"Set max_tokens low to cap spend" | RE-KILLED — truncates tool_use, bills the attempt + a higher-cap retry; use model_context_window_exceeded + a budget | 47 |
| "The multi-agent 'up to 90% faster' result applies to coding agents" | KILLED — research-only; Anthropic says coding is a poor fit; naive parallelism risks pooled-limit backoff | 43 |
| "Lossless prompt compression (LoPace) cuts your token bill" | KILLED (category error) — compresses bytes at rest, not tokens sent | 46 |
| "CompactPrompt saves ~60% for coding agents" | DOWNGRADED to T4 folklore — single vendor guide, lossy NL; LLMLingua caveats apply | 46 |
| "Keepalive pingers save the subscription quota on the Claude Code main loop" | MOOT (re-confirmed) — main loop is 1h-TTL in-allowance; Cherny: "a small win"; overage reverts to 5m | 41 |
| "A screenshot is a cheap way to show the agent code/logs" | KILLED — a screenshot is 2–6× the text it shows and caps at one frame; text wins for textual content | 42 |
| "Prompt-compression proxies are merely a cost trade" | DOWNGRADED — also a security attack surface (CompressionAttack, ≤80% ASR) | 46 |
What survived the adversarial pass
Every Volume II headline number was either documented-and-locally-reproduced, honestly tiered, or explicitly INCOMPLETE. The two most novel/load-bearing numbers were re-attacked and held:
- Image-cap divergence is ~3.0–3.1×: confirmed content-independent — a max-entropy noise image at 2560×1440 returned the same capped family split as the gradient. The formula and routing lever stand; exact capped counts vary by wrapper/envelope, so use the band rather than 4,784/1,568 as measured absolutes.
- The PDF tax ~2×: confirmed across content — code-like text gave 1.90× (Opus) / 2.11× (Sonnet), bracketing the fox-text 1.98× / 2.30×. Holds.
The community-sourced and unpublished items (cap weight ~0.1×, the cap denominator) are labeled T3 / INCOMPLETE and not promoted beyond their evidence.
The Volume II self-audit against the definition of done is appended to the research index under
## Volume II — Extension, alongside the index, headline numbers, blind-spot-map summary, and the
Volume II Assumptions section.