01 — Token Economics and Measurement

Pricing and feature claims are verified against live Anthropic documentation (URLs in the Verification ledger).

TL;DR

Verified live pricing : Fable 5 $10/$50 per MTok in/out, cache read $1 (0.1×), 5-min cache write $12.50 (1.25×), 1-hour write $20 (2×); Sonnet 4.6 $3/$15; Haiku 4.5 $1/$5; Batch API 50% off everything. 1M-token context is now standard-priced on Fable 5/Opus 4.8/Sonnet 4.6 — the brief's assumption of a long-context premium is outdated.
The exchange rates that decide everything: 1 output token = 5 input tokens = 50 cache-read tokens. Thinking bills as output. A token avoided in output is worth 50× a token avoided in cached input.
Tokenizer divergence is official, but content-specific: Anthropic documents that Opus 4.7+/Fable 5 use a new tokenizer producing "roughly 30%" more tokens for the same text. Local follow-up shows the premium is strong on English/ASCII prose, but code/CJK can be near-neutral. Cross-model dollar math must count the target corpus, not just prices.
Ground truth lives in the API usage object (input_tokens, cache_creation_input_tokens, cache_read_input_tokens, output_tokens); count_tokens is free (rate-limited 100–8,000 RPM by tier) and is the experiment instrument; Claude Code exposes /cost, /context, and full OTel metrics (claude_code.token.usage by type/model/agent).
Modeled heavy-day profile (basis for stack math): the $17 floor variant is ~25k uncached input, ~400k cache-write, ~5.5M cache-read, ~125k output tokens on Fable 5; the $22 working variant in 30 scales this to six sessions. The measured split is profile-specific; the stable invariant is that cache reads dominate token volume while output + cache writes dominate dollars.

1. Verified price table (live)

From the live pricing page (platform.claude.com/docs/en/about-claude/pricing):

Model	Input	5m cache write	1h cache write	Cache read	Output	Batch in/out
Claude Fable 5	$10	$12.50	$20	$1.00	$50	$5 / $25
Claude Opus 4.8	$5	$6.25	$10	$0.50	$25	$2.50 / $12.50
Claude Sonnet 4.6	$3	$3.75	$6	$0.30	$15	$1.50 / $7.50
Claude Haiku 4.5	$1	$1.25	$2	$0.10	$5	$0.50 / $2.50

All $/MTok. Multipliers are uniform across the lineup: cache write 1.25× (5-min TTL) or 2× (1-hour TTL), cache read 0.1×, batch 0.5× — and they stack (batch + caching compose).

Verified side facts that matter for optimization math:

Long context: Fable 5, Opus 4.8/4.7/4.6, Sonnet 4.6 include the full 1M-token window at standard pricing — "a 900k-token request is billed at the same per-token rate as a 9k-token request" (pricing page). No premium tier to engineer around anymore; the cost of a bloated context is linear, not super-linear — but quality degradation with context length is not (see 12-context-architecture.md).
Fast mode (research preview): Opus 4.8 at $10/$50 — i.e., Opus-fast costs exactly Fable 5 list. Speed, not savings.
US data residency (inference_geo: "us"): 1.1× on every token class. Don't set it idly.
Tool-use system prompt is billed and model-specific: 290 tokens on Opus 4.8, 497 on Sonnet 4.6/Opus 4.6, 496 on Haiku 4.5 (auto choice; pricing page table). Local measurement on Fable 5: ~318 tokens with one minimal tool. Any non-empty tools array pays this once per request.
Server tools: web search $10/1,000 searches; web fetch free beyond tokens; bash tool +245 input tokens; text editor tool +700 tokens (Claude 4.x).

2. The exchange rates — why token classes are different currencies

On Fable 5, per token:

Class	$/MTok	Relative to cache read
Cache read	$1.00	1×
Uncached input	$10.00	10×
5m cache write	$12.50	12.5×
1h cache write	$20.00	20×
Output (visible and thinking)	$50.00	50×

Consequences, mechanical but decisive:

Output discipline is worth 50× cache-read discipline per token. A technique that trims 1,000 output tokens equals one that trims 50,000 cache-read tokens. This single ratio reorders most folklore tier lists, which obsess over input-side prompt slimming.
Thinking is output. Locally measured at 54.8% of output tokens in a max-effort session (02-baseline-audit.md). Any output-side technique that doesn't touch thinking (all style layers) caps out at the visible share.
Cache reads are cheap, not free — and they're the volume king. 92.8% of prompt-side tokens in the measured session were cache reads; at 0.1× they were still a major dollar line (32% in that session; 21% in an independent output-heavy session), because the entire conversation prefix is re-read on every API call. Context mass costs ≈ prefix_tokens × 0.1× × calls_per_session, so a 2,738-token always-on CLAUDE.md chain costs ~52k cache-read tokens over a 19-call session — plus its share of cache writes whenever the prefix re-forms.
Break-even arithmetic for cache writes: 5-min write (1.25×) pays for itself after one read within TTL (1.25 + 0.1 < 2 × 1.0). 1-hour write (2×) needs ≥2 reads (confirmed in live docs: "caching pays off after just one cache read for the 5-minute duration… after two cache reads for the 1-hour duration"). Re-deriving the idle-gap economics from these multipliers is done in 13-caching-exploitation.md.

3. Tokenizer divergence — tokens are not a stable unit across models

Official, from the live docs :

"Opus 4.7 and later use a new tokenizer… This new tokenizer may use up to 35% more tokens for the same fixed text." (pricing page) "Claude Fable 5 … uses the tokenizer introduced with Claude Opus 4.7, which produces roughly 30% more tokens than models before Claude Opus 4.7 for the same text." (token-counting page)

Local confirmation (02-baseline-audit.md): identical text counts +15% (Python code) to +38% (English prose) on Fable 5 vs Sonnet 4.6; CJK diverges least.

Implications:

Cross-tier routing saves more than list prices imply. Moving prose-heavy work Fable 5 → Sonnet 4.6 cuts price per token 3.3× and tokens per text ~1.2–1.4×: effective ~4–4.6× on input classes. Quantified per task class in 16-model-routing-and-delegation.md.
Never reuse token counts measured on one tokenizer to budget another (docs say this explicitly for migration). All measurements in this dossier name the model they were counted on.
Tokens/char measured locally on Fable 5: ~2.3–3.4 for English/markdown, ~1.4–1.6 for CJK, more granular tables in 11-tokenizer-arbitrage.md.

4. Measurement instruments (how to see spend at all)

API usage object — the ground truth. Every Messages response carries usage.input_tokens (uncached), usage.cache_creation_input_tokens (further broken down in usage.cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens), usage.cache_read_input_tokens, usage.output_tokens (visible + thinking + tool_use blocks). Total prompt size = input + cache_creation + cache_read. All dossier arithmetic uses these fields.

count_tokens — the free experiment instrument. POST /v1/messages/count_tokens accepts messages/system/tools/thinking exactly like a real request. Verified live (token-counting page): free, separate rate limit (100 RPM tier 1 → 8,000 RPM tier 4), counts are an "estimate" that "may differ by a small amount" (system-added tokens are not billed), it never touches the cache, and thinking blocks from previous assistant turns are ignored — matching the production rule that prior-turn thinking is stripped from billed context (a built-in, automatic saving documented in 18-provider-features.md).

Claude Code surfaces.

/cost — per-session totals; /context — live context-window decomposition (system prompt, tools, MCP, memory files, messages). Quick, but session-scoped and manual.
OpenTelemetry (CLAUDE_CODE_ENABLE_TELEMETRY=1, OTLP exporters): metrics claude_code.token.usage (attribute type ∈ input / output / cacheRead / cacheCreation, plus model, and skill.name / plugin.name / agent.name for attributing spend to skills, plugins, and subagents), claude_code.cost.usage (USD), claude_code.active_time.total, plus events: claude_code.api_request (per-call tokens+cost), claude_code.compaction, claude_code.tool_decision. OTEL_LOG_RAW_API_BODIES=1 captures full request/response bodies (60 KB truncation; thinking content is always redacted). This is the right backbone for a personal token dashboard; the per-agent.name attribution is exactly what's needed to measure subagent economics (17-multi-agent-protocols.md). (code.claude.com/docs/en/monitoring-usage.)
Session JSONL transcripts (~/.claude/projects/<project>/<session>.jsonl) — per-call usage including cache fields. Two traps found locally (02-baseline-audit.md): the same message.usage repeats on every content-block line (dedup by message.id or overcount ~3×), and thinking text is redacted (thinking: ""), so thinking must be inferred as output_tokens − count_tokens(visible blocks).
Console usage pages / ccusage-style analyzers — covered with the market scan in 03-prior-art-and-market-scan.md; JSONL-based analyzers inherit both transcript traps above.

5. The modeled session profile (basis for all stack arithmetic)

Measured base (this environment, 02-baseline-audit.md): one heavy orchestration session = 19 API calls; 5,475 uncached input / 84,693 cache-write / 1,167,417 cache-read / 26,977 output tokens (54.8% thinking at max effort).

HEAVY-DAY profile. Two internally-consistent variants are used in this dossier — a $17 floor (5 sessions, 45% thinking — the table below) and a $22 working figure (6 sessions, 55% thinking) that the area files (10–19) and the composed stacks (30) adopt; both scale the same measured session, and every stack multiplier is unchanged across them. Table for the $17 floor (assumption-explicit scaling; multiply rows by 1.2 and shift the thinking split for the $22 variant):

Class	Tokens/day	× Fable 5 price	$/day	Share
Uncached input	25,000	$10/MTok	$0.25	1.5%
Cache write (5m)	400,000	$12.50/MTok	$5.00	29.4%
Cache read	5,500,000	$1/MTok	$5.50	32.4%
Output — thinking (45%)	56,250	$50/MTok	$2.81	16.5%
Output — visible (55%)	68,750	$50/MTok	$3.44	20.2%
Total			$17.00	100%

Assumptions recorded: (a) main-loop only — subagent/workflow fan-out multiplies this profile and is modeled separately in 17-multi-agent-protocols.md; (b) cache behavior healthy (92.8% read share measured) — a session with idle gaps >5 min degrades writes into the dominant line; (c) thinking share 45% (measured 54.8% at max effort; lower effort settings reduce it — sweep in 15-output-discipline.md). Sensitivity: at 55% thinking the output rows become $3.44/$2.81 reversed; total unchanged. At a 70% cache-read share (sloppier sessions), day cost rises ~$3–4 from extra writes.

The $/task metric (brief §4). All stack claims in 30-composed-stacks.md use: Nx = (baseline $/completed task) ÷ (optimized $/completed task) at statistically indistinguishable success on the validation suite (31-validation-harness.md). With ~10 completed tasks/heavy day, baseline ≈ $1.70/task on this profile.

Verification ledger

Claim / number	Basis
Full price table, multipliers, batch 50%, 1M standard pricing, fast-mode prices, inference_geo 1.1×, tool-use system prompt sizes, web search $10/1k	https://platform.claude.com/docs/en/about-claude/pricing
count_tokens free, RPM tiers 100/2,000/4,000/8,000, estimate caveat, prior-turn thinking ignored, no caching	https://platform.claude.com/docs/en/build-with-claude/token-counting
Tokenizer "roughly 30% more tokens" (Opus 4.7+ tokenizer)	token-counting page, same access date; "up to 35%" on pricing page
OTel metric names/attributes, OTEL_LOG_RAW_API_BODIES, thinking always redacted	https://code.claude.com/docs/en/monitoring-usage
Session decomposition, thinking 54.8%, prompt mix 0.44/6.73/92.83, tokenizer +15–38% local (prose-specific; code/CJK near-neutral); session split is profile-specific	Local measurements, methods in 02-baseline-audit.md
Heavy-day profile	ESTIMATE — measured session × 5, assumptions stated in §5

01 — Token Economics and Measurement

On this page