01 — Token Economics and Measurement
01 — Token Economics and Measurement
Pricing and feature claims are verified against live Anthropic documentation (URLs in the Verification ledger).
TL;DR
- Verified live pricing : Fable 5 $10/$50 per MTok in/out, cache read $1 (0.1×), 5-min cache write $12.50 (1.25×), 1-hour write $20 (2×); Sonnet 4.6 $3/$15; Haiku 4.5 $1/$5; Batch API 50% off everything. 1M-token context is now standard-priced on Fable 5/Opus 4.8/Sonnet 4.6 — the brief's assumption of a long-context premium is outdated.
- The exchange rates that decide everything: 1 output token = 5 input tokens = 50 cache-read tokens. Thinking bills as output. A token avoided in output is worth 50× a token avoided in cached input.
- Tokenizer divergence is official, but content-specific: Anthropic documents that Opus 4.7+/Fable 5 use a new tokenizer producing "roughly 30%" more tokens for the same text. Local follow-up shows the premium is strong on English/ASCII prose, but code/CJK can be near-neutral. Cross-model dollar math must count the target corpus, not just prices.
- Ground truth lives in the API usage object (
input_tokens,cache_creation_input_tokens,cache_read_input_tokens,output_tokens);count_tokensis free (rate-limited 100–8,000 RPM by tier) and is the experiment instrument; Claude Code exposes/cost,/context, and full OTel metrics (claude_code.token.usageby type/model/agent). - Modeled heavy-day profile (basis for stack math): the $17 floor variant is ~25k uncached
input, ~400k cache-write, ~5.5M cache-read, ~125k output tokens on Fable 5; the $22 working
variant in
30scales this to six sessions. The measured split is profile-specific; the stable invariant is that cache reads dominate token volume while output + cache writes dominate dollars.
1. Verified price table (live)
From the live pricing page (platform.claude.com/docs/en/about-claude/pricing):
| Model | Input | 5m cache write | 1h cache write | Cache read | Output | Batch in/out |
|---|---|---|---|---|---|---|
| Claude Fable 5 | $10 | $12.50 | $20 | $1.00 | $50 | $5 / $25 |
| Claude Opus 4.8 | $5 | $6.25 | $10 | $0.50 | $25 | $2.50 / $12.50 |
| Claude Sonnet 4.6 | $3 | $3.75 | $6 | $0.30 | $15 | $1.50 / $7.50 |
| Claude Haiku 4.5 | $1 | $1.25 | $2 | $0.10 | $5 | $0.50 / $2.50 |
All $/MTok. Multipliers are uniform across the lineup: cache write 1.25× (5-min TTL) or 2× (1-hour TTL), cache read 0.1×, batch 0.5× — and they stack (batch + caching compose).
Verified side facts that matter for optimization math:
- Long context: Fable 5, Opus 4.8/4.7/4.6, Sonnet 4.6 include the full 1M-token window at standard pricing — "a 900k-token request is billed at the same per-token rate as a 9k-token request" (pricing page). No premium tier to engineer around anymore; the cost of a bloated context is linear, not super-linear — but quality degradation with context length is not (see 12-context-architecture.md).
- Fast mode (research preview): Opus 4.8 at $10/$50 — i.e., Opus-fast costs exactly Fable 5 list. Speed, not savings.
- US data residency (
inference_geo: "us"): 1.1× on every token class. Don't set it idly. - Tool-use system prompt is billed and model-specific: 290 tokens on Opus 4.8, 497 on Sonnet
4.6/Opus 4.6, 496 on Haiku 4.5 (
autochoice; pricing page table). Local measurement on Fable 5: ~318 tokens with one minimal tool. Any non-emptytoolsarray pays this once per request. - Server tools: web search $10/1,000 searches; web fetch free beyond tokens; bash tool +245 input tokens; text editor tool +700 tokens (Claude 4.x).
2. The exchange rates — why token classes are different currencies
On Fable 5, per token:
| Class | $/MTok | Relative to cache read |
|---|---|---|
| Cache read | $1.00 | 1× |
| Uncached input | $10.00 | 10× |
| 5m cache write | $12.50 | 12.5× |
| 1h cache write | $20.00 | 20× |
| Output (visible and thinking) | $50.00 | 50× |
Consequences, mechanical but decisive:
- Output discipline is worth 50× cache-read discipline per token. A technique that trims 1,000 output tokens equals one that trims 50,000 cache-read tokens. This single ratio reorders most folklore tier lists, which obsess over input-side prompt slimming.
- Thinking is output. Locally measured at 54.8% of output tokens in a max-effort session (02-baseline-audit.md). Any output-side technique that doesn't touch thinking (all style layers) caps out at the visible share.
- Cache reads are cheap, not free — and they're the volume king. 92.8% of prompt-side
tokens in the measured session were cache reads; at 0.1× they were still a major dollar line
(32% in that session; 21% in an independent output-heavy session), because the entire
conversation prefix is re-read on every API call.
Context mass costs ≈
prefix_tokens × 0.1× × calls_per_session, so a 2,738-token always-on CLAUDE.md chain costs ~52k cache-read tokens over a 19-call session — plus its share of cache writes whenever the prefix re-forms. - Break-even arithmetic for cache writes: 5-min write (1.25×) pays for itself after one read within TTL (1.25 + 0.1 < 2 × 1.0). 1-hour write (2×) needs ≥2 reads (confirmed in live docs: "caching pays off after just one cache read for the 5-minute duration… after two cache reads for the 1-hour duration"). Re-deriving the idle-gap economics from these multipliers is done in 13-caching-exploitation.md.
3. Tokenizer divergence — tokens are not a stable unit across models
Official, from the live docs :
"Opus 4.7 and later use a new tokenizer… This new tokenizer may use up to 35% more tokens for the same fixed text." (pricing page) "Claude Fable 5 … uses the tokenizer introduced with Claude Opus 4.7, which produces roughly 30% more tokens than models before Claude Opus 4.7 for the same text." (token-counting page)
Local confirmation (02-baseline-audit.md): identical text counts +15% (Python code) to +38% (English prose) on Fable 5 vs Sonnet 4.6; CJK diverges least.
Implications:
- Cross-tier routing saves more than list prices imply. Moving prose-heavy work Fable 5 → Sonnet 4.6 cuts price per token 3.3× and tokens per text ~1.2–1.4×: effective ~4–4.6× on input classes. Quantified per task class in 16-model-routing-and-delegation.md.
- Never reuse token counts measured on one tokenizer to budget another (docs say this explicitly for migration). All measurements in this dossier name the model they were counted on.
- Tokens/char measured locally on Fable 5: ~2.3–3.4 for English/markdown, ~1.4–1.6 for CJK, more granular tables in 11-tokenizer-arbitrage.md.
4. Measurement instruments (how to see spend at all)
API usage object — the ground truth. Every Messages response carries
usage.input_tokens (uncached), usage.cache_creation_input_tokens (further broken down in
usage.cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens),
usage.cache_read_input_tokens, usage.output_tokens (visible + thinking + tool_use blocks).
Total prompt size = input + cache_creation + cache_read. All dossier arithmetic uses these fields.
count_tokens — the free experiment instrument. POST /v1/messages/count_tokens accepts
messages/system/tools/thinking exactly like a real request. Verified live (token-counting page): free, separate rate limit (100 RPM tier 1 → 8,000 RPM tier 4), counts are an
"estimate" that "may differ by a small amount" (system-added tokens are not billed), it never
touches the cache, and thinking blocks from previous assistant turns are ignored — matching
the production rule that prior-turn thinking is stripped from billed context (a built-in,
automatic saving documented in 18-provider-features.md).
Claude Code surfaces.
/cost— per-session totals;/context— live context-window decomposition (system prompt, tools, MCP, memory files, messages). Quick, but session-scoped and manual.- OpenTelemetry (
CLAUDE_CODE_ENABLE_TELEMETRY=1, OTLP exporters): metricsclaude_code.token.usage(attributetype∈input/output/cacheRead/cacheCreation, plusmodel, andskill.name/plugin.name/agent.namefor attributing spend to skills, plugins, and subagents),claude_code.cost.usage(USD),claude_code.active_time.total, plus events:claude_code.api_request(per-call tokens+cost),claude_code.compaction,claude_code.tool_decision.OTEL_LOG_RAW_API_BODIES=1captures full request/response bodies (60 KB truncation; thinking content is always redacted). This is the right backbone for a personal token dashboard; the per-agent.nameattribution is exactly what's needed to measure subagent economics (17-multi-agent-protocols.md). (code.claude.com/docs/en/monitoring-usage.) - Session JSONL transcripts (
~/.claude/projects/<project>/<session>.jsonl) — per-call usage including cache fields. Two traps found locally (02-baseline-audit.md): the samemessage.usagerepeats on every content-block line (dedup bymessage.idor overcount ~3×), and thinking text is redacted (thinking: ""), so thinking must be inferred asoutput_tokens − count_tokens(visible blocks). - Console usage pages / ccusage-style analyzers — covered with the market scan in 03-prior-art-and-market-scan.md; JSONL-based analyzers inherit both transcript traps above.
5. The modeled session profile (basis for all stack arithmetic)
Measured base (this environment, 02-baseline-audit.md): one heavy orchestration session = 19 API calls; 5,475 uncached input / 84,693 cache-write / 1,167,417 cache-read / 26,977 output tokens (54.8% thinking at max effort).
HEAVY-DAY profile. Two internally-consistent variants are used in this dossier — a $17 floor (5 sessions, 45% thinking — the table below) and a $22 working figure (6 sessions, 55% thinking) that the area files (10–19) and the composed stacks (30) adopt; both scale the same measured session, and every stack multiplier is unchanged across them. Table for the $17 floor (assumption-explicit scaling; multiply rows by 1.2 and shift the thinking split for the $22 variant):
| Class | Tokens/day | × Fable 5 price | $/day | Share |
|---|---|---|---|---|
| Uncached input | 25,000 | $10/MTok | $0.25 | 1.5% |
| Cache write (5m) | 400,000 | $12.50/MTok | $5.00 | 29.4% |
| Cache read | 5,500,000 | $1/MTok | $5.50 | 32.4% |
| Output — thinking (45%) | 56,250 | $50/MTok | $2.81 | 16.5% |
| Output — visible (55%) | 68,750 | $50/MTok | $3.44 | 20.2% |
| Total | $17.00 | 100% |
Assumptions recorded: (a) main-loop only — subagent/workflow fan-out multiplies this profile and is modeled separately in 17-multi-agent-protocols.md; (b) cache behavior healthy (92.8% read share measured) — a session with idle gaps >5 min degrades writes into the dominant line; (c) thinking share 45% (measured 54.8% at max effort; lower effort settings reduce it — sweep in 15-output-discipline.md). Sensitivity: at 55% thinking the output rows become $3.44/$2.81 reversed; total unchanged. At a 70% cache-read share (sloppier sessions), day cost rises ~$3–4 from extra writes.
The $/task metric (brief §4). All stack claims in 30-composed-stacks.md use:
Nx = (baseline $/completed task) ÷ (optimized $/completed task) at statistically
indistinguishable success on the validation suite (31-validation-harness.md). With ~10 completed
tasks/heavy day, baseline ≈ $1.70/task on this profile.
Verification ledger
| Claim / number | Basis |
|---|---|
| Full price table, multipliers, batch 50%, 1M standard pricing, fast-mode prices, inference_geo 1.1×, tool-use system prompt sizes, web search $10/1k | https://platform.claude.com/docs/en/about-claude/pricing |
| count_tokens free, RPM tiers 100/2,000/4,000/8,000, estimate caveat, prior-turn thinking ignored, no caching | https://platform.claude.com/docs/en/build-with-claude/token-counting |
| Tokenizer "roughly 30% more tokens" (Opus 4.7+ tokenizer) | token-counting page, same access date; "up to 35%" on pricing page |
| OTel metric names/attributes, OTEL_LOG_RAW_API_BODIES, thinking always redacted | https://code.claude.com/docs/en/monitoring-usage |
| Session decomposition, thinking 54.8%, prompt mix 0.44/6.73/92.83, tokenizer +15–38% local (prose-specific; code/CJK near-neutral); session split is profile-specific | Local measurements, methods in 02-baseline-audit.md |
| Heavy-day profile | ESTIMATE — measured session × 5, assumptions stated in §5 |