40 — Volume II extension: gap audit and blind-spot map

(Volume I froze). This is the first artifact of Volume II: an independent re-mapping of the token-optimization space, overlaid on the frozen Volume I dossier (files 00–32), to find the cells Volume I left blank or drew too thin. It is deliberately pushed before the deep dives so the gap map can be reviewed before research spends on it. Volume I is treated as finished and correct; nothing here edits files 00–32.

TL;DR

Eight seeded blind spots were audited by overlaying an independent six-axis taxonomy on all 19 Volume I files (a 14-agent coverage sweep plus a main-process grep). Verdict: five are genuinely thin or absent (quota economics, multimodal/vision/PDF, latency-as-a-cost- axis, budget governance, online quality detection), three are partially covered with specific open sub-questions (fleet/multi-tenant, cross-provider portability, fresh literature). All eight survive as real Volume II work; none turned out already-covered.
The optimization target may be the wrong unit for this operator. The local credential is a Max subscription (~/.claude/.credentials.json → subscriptionType: max). Volume I prices everything in dollars on API rates and explicitly flags but never solves the quota question; "quota" appears ≥5 times in file 13 and is modeled zero times. For a flat-rate subscriber the binding constraint is the weekly/session cap, where one measured account burned 1,310 cache-read tokens per productive I/O token (GitHub #24147) — so several Volume I "savings" are dollar-only and may vanish or invert against a cap. This is the strongest gap.
Multimodal is a near-total blank: 0 real hits in 4,303 dossier lines (the only adjacent fact is one "~125k tokens per 500 kB PDF" page-size estimate). Coding agents screenshot TUIs and browsers, read PDFs, and paste diagrams; image/PDF token cost is measurable today via count_tokens and is unmeasured anywhere in Volume I.
Live drift already found (rule #5): count_tokens now rejects claude-fable-5 ("not available — use Opus 4.8"), the model Volume I measured on directly. Volume I established Fable 5 and Opus 4.8 share one tokenizer (exact-family equality), so claude-opus-4-8 is the valid Fable-family proxy for all Volume II token counts; this substitution is noted wherever used. Pricing, plan limits, and betas are re-verified live in the area files, not recalled.
Volume II will ship area files 41–47 (one per pursued gap), a frontier file 48 (≥6 new ideas not duplicating Volume I's sixteen K-ideas), and a stacks-and-verdict file 49 (coverage-delta ledger, Corrections to Volume I, and whether any of this moves Volume I's ≈2.6x / ≈5–6.6x / no-true-10x picture). Preliminary verdict-delta below; the arithmetic lands in 49.

The pricing, modeled session profile, and instrument conventions are inherited from 01-economics-and-measurement.md; dollar arithmetic reuses Volume I's $22/day working profile (README Assumption 6) so Volume II numbers compose with Volume I's.

Gap-audit method

The brief forbids restating Volume I's own table of contents and asks instead for an independent taxonomy overlaid on it. The procedure, all:

Independent taxonomy (no web, built first). The token-optimization space was decomposed along six axes chosen without reference to Volume I's A–L area letters (below). The axes are the dimensions of the problem (what you are billed in, which token class carries it, which surface emits it, which lever acts, at what scope, via what delivery), not a list of techniques.
Overlay by coverage sweep. A 14-agent read-only workflow (vol1-coverage-map, one agent per Volume I file 01–32, model Explore, structured output) rated each file's depth on each of the eight seeded blind spots — absent / mention / partial / full — with a file:line citation and an evidence quote. Files 00, 03, 13, and 20 were read in full by the main process instead.
Independent cross-check. A main-process grep -rniE over the eight blind-spot term clusters produced hit counts per file, used to confirm or challenge each agent's depth rating. Where the two disagreed (e.g. multimodal showed 14 raw grep hits), the hits were inspected by hand; all 14 were false positives (revision/decision) or one incidental PDF page-size line — confirming absent.
Verdict. A blind spot is "confirmed thin" only if no file rated full on it and the partials leave the decision-relevant sub-question open. Each confirmed cell carries a one-line dollar-or-quota rationale for why closing it matters.

This overview is the map. The deep dives (E1 fresh web sweep, E2 per-technique records, E3 adversarial validation, E4 verdict delta, E5 self-audit) follow in later commits.

An independent taxonomy of the space

Six orthogonal axes. Any technique is a point in this space; Volume I's density is uneven across it.

Axis	Values	Where Volume I is dense	Where Volume I is thin/blank
A. Cost metric — what you are billed in	dollars · subscription quota/cap · wall-clock/latency · human attention	dollars (01, all)	quota (named, unmodeled) · latency (scattered mentions, no model) · human-time (absent)
B. Token class carrying the cost	uncached-in · cache-write · cache-read · visible-output · thinking-output · image/vision · document/PDF	all five text classes (02, 13, 15)	image and document classes (absent)
C. Surface emitting tokens	system/prefix · tools · messages/context · model output · tool-result media (screenshots, PDFs)	first four (02, 12, 15)	media tool-results (absent)
D. Lever class	style · tokenizer · context-arch · caching · retrieval · output-discipline · routing · multi-agent · provider-features · infra · governance/guardrails · cross-agent portability	the ten Volume I areas 10–19	runtime governance (max_tokens only as a rail) · portability (no matrix) · online quality guarding (offline only)
E. Scope of action	turn · session · cross-session · single-container · fleet/multi-tenant · org	turn→cross-session (12, 13, 14)	hosted fleet sharing (self-host done in 19; hosted-subscription fleet thin) · org-cache (mention)
F. Delivery mechanism	discipline · config · hooks/skills · orchestrator-baked · provider-action	all (32, 20 K16)	— (well covered)

Volume I is essentially complete on axis D rows 10–19, axis B text classes, and axis F. The blind spots are concentrated in axis A (any metric other than dollars), axis B/C media classes, axis D governance/portability/online-quality, and axis E fleet scope. That is the shape Volume II fills.

Depth is the best rating any single Volume I file earned on that topic in the coverage sweep. "Stake" is the dollar-or-quota reason the cell matters. Citations are file:line in Volume I.

#	Blind spot	Vol I best coverage	Verdict	Stake (why it moves a number or decision)	Vol II target
1	Subscription & quota economics	`13:237` Gaps#1, `13:10/204/222` (#24147 1,310:1, "no formula"); `19:7` cost-split only	THIN — named ≥5×, modeled 0×	For a Max subscriber the cap, not dollars, binds. Cache-read levers that look free in $ may dominate quota; some Vol I savings invert.	41
2	Multimodal / vision / PDF	none; `03:267`/`18:165` one "~125k tok/500 kB PDF" estimate	ABSENT — 0 real hits/4,303 lines	A screenshot can be a token bomb or a bargain vs a DOM/AST dump; unpriced. Measurable now via `count_tokens`.	42
3	Latency / wall-clock / human-time	`17:110` "dollars-for-wallclock", `18:132` batch-latency, `19:147` TTFT; `01:52` "speed not savings"	PARTIAL — mentioned widely, never a model	When finishing faster is worth more than the tokens it costs (fan-out, fast mode, proxy round-trips), Vol I gives no decision rule.	43
4	Fleet / team / multi-tenant cache	`19:116-187` self-host (full); `13` tech 7 excludeDynamicSections; `17:110` spawn waves; `30:115` U5	PARTIAL — self-host done; hosted-fleet sub-questions open	Dynamic-section size is unmeasured (`13` Gaps#6); hosted N-container prefix sharing and fleet×quota are unpriced.	44
5	Cross-agent / cross-provider portability	`14`/`15:45` scattered availability labels; `18:196` OpenAI/Gemini caching baselines	THIN — no portability matrix	A stack that dies on an agent switch is fragile; which levers survive Cursor/Codex/Gemini/Copilot/Aider/OpenCode is unstated.	45
6	Fresh literature & market delta	strong scan (`10` SoT/TALE, `12` SWE-Pruner, `19` HiCache/LMCache/RadixAttention, `16` RouteLLM)	PARTIAL — specific holes	Missing entirely: KV-eviction/quant family (SnapKV/H2O/PyramidKV/KVQuant = 0 hits), CAG (0 hits), "context engineering" (0 hits), and any provider changelog drift since 06-12.	46
7	Vol I's own open questions, worked	`15:196-201`, `18:49`, `16:251`, `11:192`, `13` Gaps, `02:207`	OPEN — enumerated, unanswered	Effort→thinking %, prior-turn-thinking billing, count_tokens-vs-billed drift, dynamic-section size — each now locally answerable; some change stack math.	distributed → 49 ledger
8	Meta: optimization cost, online quality, governance	`15:123` max_tokens-as-rail, `32:76` CI linter, `31` offline harness, `32:13` canary re-runs	THIN — no runtime governance, no online drift detection	Measurement machinery has its own token cost (break-even of optimizing); production drift needs live canaries, not an offline suite; hard spend caps/circuit breakers are unbuilt.	47

Volume II index

File	Title	Maps to blind spot(s)	Status
`40-extension-overview.md`	This gap audit and blind-spot map	method	landed
`41-subscription-and-quota-economics.md`	Quota-weighted cost model for a capped subscriber	1 (+ fleet×quota of 4)	pending
`42-multimodal-token-economics.md`	Image / screenshot / PDF token costs, measured	2	pending
`43-latency-and-time-economics.md`	Wall-clock and human-time as a second cost axis	3	pending
`44-fleet-and-multitenant-cache.md`	Hosted cross-container cache sharing and dedup	4	pending
`45-cross-agent-portability.md`	Portability matrix across coding agents	5	pending
`46-fresh-literature-and-market-delta.md`	Clean-room re-sweep; KV-eviction family, CAG, changelog drift	6 (+7)	pending
`47-meta-cost-governance-and-online-quality.md`	Cost of optimizing; budget governance; live quality guards	8 (+7)	pending
`48-extension-frontier.md`	≥6 new frontier ideas (not duplicating K1–K16)	—	pending
`49-extension-stacks-and-verdict.md`	Coverage-delta ledger, verdict delta, Corrections to Volume I	7	pending

If any pursued gap collapses on contact with research (turns out adequately covered, or its arithmetic proves it cannot move a number), its file ships with an INCOMPLETE banner saying so and the count of full area files stays at the brief's floor of five.

Preliminary verdict delta (hypotheses to be settled in 49)

Volume I's headline stands until 49's arithmetic says otherwise. The candidate movers, in order of how much they could change the picture:

Metric replacement, not multiplier change (largest). If the operator is quota-bound (Max), the right denominator is "tasks per weekly cap," not "dollars per task." Under that metric the tier list re-sorts: cache-read-heavy levers and fleet fan-out can lose even where they win on dollars, because reads dominate quota ~1,310:1. Volume II's likeliest headline is a second cost model presented alongside Volume I's dollar model, not a new multiple on the same axis.
No new dollar multiplier is expected to break the 10x wall. The binding constraints Volume I named (frontier thinking output; the cache-read floor) are structural; nothing in the eight gaps obviously removes them. Multimodal, latency, and governance change which choice is correct and what you risk, not the ceiling on dollar reduction at equal quality.
A few gaps may add modest, real dollar levers (e.g. screenshot-vs-text substitution where a screenshot is genuinely cheaper; vision-token discipline). These will be costed honestly on the profile and slotted into the tier list in 49.

Self-audit mirror (Volume II definition-of-done — live)

Instruments and conventions (Volume II)

count_tokens via OAuth (free, non-billable), rebuilt this run at /tmp/ct.py (the Volume I path; the prior container's copy did not persist). Reads ~/.claude/.credentials.json → claudeAiOauth.accessToken, posts to /v1/messages/count_tokens with the oauth-2025-04-20 beta header. Sanity check, "The quick brown fox jumps over the lazy dog.": Opus 4.8 = 24, Sonnet 4.6 = 18, Haiku 4.5 = 18 (+33% Fable-family premium — consistent with Volume I's ~30%).
Fable-family tokenizer = claude-opus-4-8 for Volume II (Fable 5 no longer accepts count_tokens; the two share a tokenizer per Volume I 00 §10). All "Fable 5" token counts in Volume II are Opus 4.8 counts and labeled as such.
Transcripts: ~/.claude/projects/**/*.jsonl (19 files this run) carry per-call message.usage; same source Volume I used for decomposition.
No image/PDF tooling on this box (no PIL/imagemagick/qpdf). Volume II generates test PNGs and PDFs from the Python standard library (zlib) so the image/document token curves can be measured at controlled dimensions; method shown in 42.
Dollar profile: Volume I's $22/day working figure (6 sessions, 55% thinking) for any $-arithmetic; the $17/day floor where a file explicitly uses it. Ratios are profile-invariant.

40 — Volume II extension: gap audit and blind-spot map

40 — Volume II extension: gap audit and blind-spot map

Gap-audit method

An independent taxonomy of the space

The blind-spot map

Volume II index

Preliminary verdict delta (hypotheses to be settled in 49)

Self-audit mirror (Volume II definition-of-done — live)

Instruments and conventions (Volume II)

On this page