Sequenced for one operator; the governing split is
automatic beats disciplined — anything that depends on remembering loses to anything that is
a default. Each item names its automation vehicle (hook / skill / plugin / settings / jackin')
or is flagged DISCIPLINE with its decay risk.
Day 1 is configuration, not behavior: fix the double-registered hooks, pin subagent models,
add two CLAUDE.md guard-lines, stop mid-session model switches. ~1 hour, no quality risk.
Week 1 instruments before it optimizes: OTel token dashboard + the harness screening run,
then effort tiering and register choice with canaries, because those two carry the
quality risk.
Month 1 moves the savings into infrastructure: context editing via SDK paths, routing flips
validated per task class, and the jackin' token-pack so every launched container is born
optimized — discipline converted to defaults at launch.rs / RoleManifest / CapsuleConfig.
Standing rule: re-audit after every Claude Code release — the fixed prefix once grew ~70k
tokens in 5 days (GH #45188); platform drift dwarfs micro-optimizations.
Add two CLAUDE.md lines: "Prefer Edit over Write on existing files; never rewrite a file to change a few lines" and "Reference file:line instead of restating code; no post-edit summaries beyond one line"
CLAUDE.md (cached at 0.1x)
−20% visible output (15 §2–3)
Adopt session habits: model/effort chosen before the session, not mid-way; /clear between unrelated tasks; let auto-compact work
DISCIPLINE (decays — move to jackin' pack in month 1)
Explicitly not Day 1: register compression beyond what you already run, effort below high,
any context-clearing automation — those need Week 1's instrumentation first.
Stand up measurement (01 §4): CLAUDE_CODE_ENABLE_TELEMETRY=1, OTLP →ny collector,
panel on claude_code.token.usage by type and agent.name, plus claude_code.cost.usage.
Without this, every later claim about your own savings is folklore. (Hook/infra; hours.)
Run the harness screening arm (31): n=12 tasks + 6 canaries on your current setup —
this is the baseline every later change is judged against. (Script; ~an evening of wall-clock,
mostly unattended.)
Effort tiering with canaries (15 §1): /effort medium for routine work, high for hard
tasks; re-run canaries; record the thinking-share delta from your dashboard (closes the
biggest open measurement in the dossier). DISCIPLINE until month 1 bakes per-task defaults.
Register decision (10/02): caveman-ultra ≈58.5% of visible prose at real caveat risk —
keep, or swap to plain terse-instruction (15 §3) which captures most of the win with less
misread risk. Decide on canary results, not vibes. Skip wenyan: no token advantage over ultra
(02 §5).
Structured-output sidecars (15 §5): any scripted extraction/classification moves to
schema-constrained SDK calls — one stable schema per thread (cache pitfall).
Context editing on long-running SDK/headless flows (18/14): clear_tool_uses_20250919
with clear_at_least sized above the re-write cost; assert via applied_edits and cache
fields. Claude Code interactive already auto-compacts; don't double-manage.
Routing flip, validated: advisor pattern first (T1-backed, NEGATIVE-COST), then
Sonnet-main + escalation per task class that passes n=30 confirmation (30 §3 — the 5–6.6x
lives or dies here).
Batch lane for nightly sweeps (audits, doc regen, triage): Batch API + caching stack
(reads at 0.05x effective; 13/18).
The jackin' token-pack — discipline becomes infrastructure (20 §jackin; Explore-mapped
insertion points, this repo):
env defaults into the container env assembly at crates/jackin-runtime/src/runtime/launch.rs:590-717
(effort tier, telemetry on, attribution header policy);
[token_policy] block in jackin.role.toml (RoleManifest, crates/jackin-core/src/manifest.rs)
→ per-role model/effort/register defaults, hooks installed via the existing
runtime_setup.rs symlink pattern;
CapsuleConfig (crates/jackin-protocol/src/lib.rs:25-45) → /jackin/run/agent.toml →
build_agent_command so every spawned agent gets model/effort flags without operator
action;
OTLP plumbing already propagates (launch.rs:694-723) — extend the existing traces with
the token dashboard so each launched capsule reports its class-split spend.
Add the compression-hygiene linter (20): CI fails when always-loaded context (root
AGENTS.md chain, role files) grows past a token budget — measured with the free
count_tokens endpoint in CI.
Re-audit cadence: pin a monthly (and post-release) re-run of 02's baseline audit script —
prefix size, MCP schema mass, hook payloads. Platform drift is the biggest unmanaged variable
(12 §TL;DR, GH #45188).