jackin'
ResearchToken Optimization Research

48 — Volume II frontier: new "unrealistic but maybe real" ideas

48 — Volume II frontier: new "unrealistic but maybe real" ideas

Eight frontier ideas that arise from Volume II's blind spots and do not duplicate Volume I's sixteen (20 K1–K16). Each is worked mechanism → savings math → feasibility verdict (REAL-NOW / BUILDABLE / RESEARCH-STAGE / BLOCKED-BY-<x> / PHYSICS-SAYS-NO) with an evidence tier and a coverage-delta note. Arithmetic uses Volume I's modeled profile (note: Fable-priced; ~half for an Opus-4.8 subscriber once Fable 5 is removed, file 46); quota figures use file 41's model.

TL;DR

  • Volume II adds 8 deployable-but-unbuilt frontier ideas; none reopens the blocked hosted-KV or soft-prompt ceiling from Volume I.
  • The biggest quota unlock is a per-account cap prober that fits the unpublished denominator from response headers, making tasks-per-cap optimization measurable.
  • The biggest automatic-dollar levers are vision routing/transcoding and a warm-repo CAG prefix choreographer, both buildable in an orchestrator.
  • The frontier verdict stays pragmatic: these are integration projects, not physics breakthroughs.

The board

#IdeaBlind spotVerdictHonest effectTier
V1Quota-window scheduler1 quotaBUILDABLEfrees cap headroom (no $ saving)T1/T3
V2Per-account quota-denominator prober1 quotaBUILDABLEcloses the unpublished-cap-weight gap empiricallyT1/T3
V3Vision-tier auto-router2 multimodalBUILDABLE−67% image tokens on routed framesT1
V4Screenshot/PDF → text transcoder at ingestion2 multimodalBUILDABLE−50% to −85% on textual mediaT1
V5Time-value auto fast-mode3 latencyBUILDABLEbuys wall-clock only when a human is blockedT1
V6Hosted "warm repo" CAG prefix + fleet choreographer4 fleet / 6 CAGBUILDABLErepo at 0.1× across a fleet; ~10× cold-start cutT1/T2
V7Online-canary-gated adaptive compression8 online-qualityBUILDABLEunlocks aggressive compression at a live safety netT1
V8Cross-provider portable token-policy compiler5 portabilityBUILDABLEthe stack survives an agent switchT1

None is BLOCKED-BY-hosted-API or PHYSICS-SAYS-NO — Volume I already mapped that ceiling (K1/K2 soft-prompts/KV-export are the blocked megaleverage). Volume II's frontier is deployable-but-unbuilt: the gaps are blind spots, not physics.


V1. Quota-window scheduler — shape work to the cap's reset clock

Coverage-delta: New. No Volume I frontier idea touches the subscription cap (quota is blind spot 1); K13 (keepalive) is about TTL, not the usage window.

Mechanism: the subscription cap is a rolling 5-hour window plus a fixed weekly anchor (file 41). Cache reads weigh ~0.1× against it, but the binding event is the window boundary, not per-token price. A scheduler defers discretionary/batchable work (sweeps, nightly review, large refactors) to just after a 5-hour reset and away from the days approaching the weekly anchor; on Max it routes Sonnet-heavy work against the Sonnet-only weekly limit to preserve the all-model budget for Opus work. It is the quota-axis analogue of batch scheduling (file 43 L5).

Savings math: no dollar saving (subscription is flat); it raises tasks-per-cap by smoothing burn across windows so the operator hits the wall less often. With two weekly limits on Max, steering an estimated 30–50% of routine work onto the Sonnet band preserves the all-model budget for the hardest tasks (ESTIMATE; magnitude is per-workload and unmeasurable without the unpublished denominator — see V2).

Feasibility verdict: BUILDABLE — a cron/queue that reads /usage cap-% (or the unified-* headers) and releases queued work when headroom exists. The blocker is the opaque denominator (V2); with it, this becomes a closed-loop scheduler.

Tier: T1 (cap structure) + T3 (the ~0.1× weight it schedules around). Quality risk: NEUTRAL (same work, different time). Effort: medium.

V2. Per-account quota-denominator prober — fit the cap weight from your own headers

Coverage-delta: New. Directly attacks file 41's bounded INCOMPLETE (the unpublished cap denominator + cache-read weight); no Volume I idea reads the unified-* headers.

Mechanism: Anthropic does not publish the token denominator of a window or the exact cache-read cap weight, but the anthropic-ratelimit-unified-* response headers (5h-utilization, 7d-utilization, reset) expose cap-% per call. A transparent pass-through proxy (cc-relay-style) logs (tokens-by-class, cap-%) per request; a regression fits the per-class cap weights and the 100%-denominator for this account — the empirical method three community datasets already used to triangulate cache_read ≈ 0.1× (file 41).

Savings math: no direct saving; it converts file 41's "tasks-per-cap is unquantifiable" into a measured per-account model, which is the precondition for V1 and for honestly costing every quota lever. Closes the dossier's largest INCOMPLETE.

Feasibility verdict: BUILDABLE today (the community tools exist); the caveat is that the cap denominator shifted ~2× and resets periodically, so the fit must be re-run after limit changes.

Tier: T1 (headers exist, observed by multiple proxies) + T3 (the fit). Quality risk: NEUTRAL, if the proxy preserves cache_control (a careless proxy busts the cache — file 41 Q1). Effort: medium.

V3. Vision-tier auto-router — every screenshot to the cheap tokenizer family

Coverage-delta: New. Volume I's routing (K11) routes by text tokenizer; this routes images by the 3.05× per-image cap divergence (file 42), which Volume I never measured.

Mechanism: a hook intercepts image/screenshot content and dispatches it to a Sonnet/Haiku subagent (per-image cap 1,568 tokens) instead of the Opus/Fable main loop (cap 4,784), returning a text summary to the main thread. The pixels never touch the expensive family's context.

Savings math: per full-frame screenshot, 4,784 → 1,568 image tokens = −67% (file 42 measured). A 20-frame debugging session: 20 × (4,784 − 1,568) = 64,320 tokens shifted off the expensive family — modest in dollars (image tokens at input price) but real in quota (file 41) and window pressure, and larger on the operator's current Opus-4.8 main loop where every main-thread screenshot pays the 4,784 cap.

Feasibility verdict: BUILDABLE — a PreToolUse hook + a vision subagent pinned model: haiku. The only friction is summarization fidelity (the main thread sees text, not pixels).

Tier: T1 (measured caps, file 42). Quality risk: QUALITY-TRADE if the summary drops a visual detail the main task needs; NEUTRAL for UI-state/log screenshots. Effort: hours.

V4. Screenshot/PDF → text transcoder at ingestion — pay text, not the media tax

Coverage-delta: New. Volume I has zero multimodal; this operationalizes file 42's "text beats pixels for textual content" and "avoid the PDF tax" as an automatic ingestion step.

Mechanism: before any screenshot or PDF enters context, a local step extracts its text — OCR / accessibility-tree for screenshots, pdftotext for born-digital PDFs — and feeds the text, falling back to the image only when layout is load-bearing (a rendered chart, a visual bug). This pays text tokens (exact, scrollable) instead of the 1,568–4,784 image cap or the 1.98–2.30× PDF tax (file 42).

Savings math: a dense code screenful as text is 593–765 tokens vs a 1,568–4,784 screenshot = −50% to −85%; a 25-page text-extractable PDF is ~40,000 tokens as text vs 78,806 as a PDF = ~−50% (file 42 measured). Plus exact characters and downstream grep-ability.

Feasibility verdict: BUILDABLE — needs a local OCR/extraction tool in the container (jackin' can bake it in, file 44 F6). For born-digital PDFs pdftotext is trivial; OCR for screenshots is heavier.

Tier: T1 (measured token deltas). Quality risk: NEGATIVE-COST for textual media (cheaper + exact); RISKY only if OCR errs or layout mattered — keep the image-fallback path. Effort: hours (PDF) to days (robust screenshot OCR).

V5. Time-value auto fast-mode — flip fast mode by who is waiting

Coverage-delta: New. Volume I never models latency; this automates file 43's v·t·s > Δ$ inequality.

Mechanism: an orchestrator classifies each turn as interactive (a human is blocked) or autonomous (batch/CI/overnight) and toggles fast mode accordingly — fast mode on Opus 4.8 buys up to 2.5× speed for 2× price (file 43), worth it when a developer-minute (~$0.83–1.25) times the minutes saved exceeds the token premium, i.e. exactly when a human waits. Autonomous turns stay standard or go to batch (50% off). On a subscription, fast mode also bypasses the cap (draws credits) — a lever to finish without burning cap headroom at a dollar price.

Savings math: on a 5-minute interactive task costing ~$0.50 in tokens, fast mode adds ~$0.50 and returns ~3 minutes ≈ $3.75 of developer time (≈7:1, file 43 ESTIMATE); on autonomous work it saves the premium entirely (t≈0 → never buy speed). Net: the same total-cost optimum file 43 derives, applied automatically.

Feasibility verdict: BUILDABLE — detect interactive-vs-autonomous from the launch context (jackin' knows whether a human is attached) and set speed: "fast" at session start (never mid-turn — it re-bills the prefix, file 43).

Tier: T1 (fast-mode pricing/speed) + ESTIMATE (developer-minute value). Quality risk: NEUTRAL (identical model/quality). Effort: hours.

V6. Hosted "warm repo" — the CAG pattern as a fleet-shared, always-warm cached prefix

Coverage-delta: New synthesis of file 46 FL1 (CAG-via-caching) + file 44 (fleet workspace cache) + the /cd and 1h-TTL levers; distinct from K6 (codebooks, small recurring strings) and K16 (the general pack) by being the whole stable repo core as a persistent shared artifact.

Mechanism: designate the repo's stable core (key source files, the spec, the API surface) as a cache_control prefix; pin the fleet to one workspace (file 44 F1) with excludeDynamicSections (F2) so every container shares one cached copy; keep it warm with 1h TTL + a pre-warm/keepalive ping (Vol I K13 / Aider's pattern, file 45 P2). Every container then reads the repo at 0.1× instead of re-exploring — the CAG "preload-and-reuse" pattern realized across a hosted fleet, composing with caching rather than against it (unlike LLMLingua).

Savings math: the shared-prefix fleet math (file 44 F1/F3): N containers → 1 write + (N−1) 0.1× reads of the repo core; cold-start ~10× cut (F3). Per turn, the repo core costs 0.1× instead of fresh exploration tokens. Bounded by the 200K subscription context (file 41) — the core, not the whole repo, fits.

Feasibility verdict: BUILDABLE — jackin's launcher is the natural home (it already owns the insertion points, Vol I K16 / file 44 F6). The hard part is curating "the stable core" and keeping it byte-stable (any edit busts it).

Tier: T1 (caching/fleet mechanics) + T2 (CAG quality-vs-RAG). Quality risk: NEUTRAL-to- NEGATIVE-COST when the core fits and is current; RISKY if it goes stale in the cached prefix (re-warm on change). Effort: high (curation + fleet wiring), amortized across launches.

V7. Online-canary-gated adaptive compression — compress hard only while a live judge says it's safe

Coverage-delta: New. Connects file 47's online judge (blind spot 8) to compression; Volume I's compression (10) and harness (31) are offline — nothing self-regulates compression on live quality.

Mechanism: run aggressive output compression (caveman-ultra, terse registers, tight effort) by default, with a sampled async LLM-as-judge (file 47 G3) watching production traces for caveat-drop / negation loss / missed warnings. On a drift alarm, the orchestrator auto-reverts the affected lane to a safer register until the canary clears. Compression becomes a closed loop with a live floor instead of a static gamble.

Savings math: lets the operator run at the aggressive end of Volume I's register/effort curve (the 58.5% caveman-ultra, the high→medium effort) without the standing caveat-drop risk Volume I flagged as unmeasured — turning a RISKY lever into a guarded one. The net is the aggressive lever's saving minus the guard tax (file 47 G4: sampling 1–10%); positive when the compressed lane is large and the judge is cheap.

Feasibility verdict: BUILDABLE — wire a validated reference-free judge (LangSmith/Braintrust/Arize AX) over the compressed lane's traces with a revert webhook. The blocker is judge calibration (file 47: validate the judge first).

Tier: T1 (online-eval tooling). Quality risk: the point is to bound quality risk; mis-calibration (false clears) is the residual risk. Effort: days.

V8. Cross-provider portable token-policy compiler — one policy, every agent's config

Coverage-delta: New. Operationalizes file 45's portability matrix; Volume I is single-agent.

Mechanism: a declarative token-policy (effort tier, model-routing rules, context-rules files, output caps, cache discipline) compiles to each agent's native config: Cursor .cursor/rules + model variants, Codex config.toml profiles, Gemini settings.json aliases + contextManagement, Aider flags (--cache-prompts, --map-tokens, architect/editor/weak), Claude Code env + role TOML (jackin' K16). The stack survives an agent switch as a recompile, not a rewrite.

Savings math: no new per-lever saving; it preserves the whole stack's savings across agents and prevents the silent loss when a team moves tools (file 45: ~80% of the stack ports as discipline, ~60% as feature). Value = avoided re-derivation + avoided drift on the non-portable edges (cache_control, fast mode, register compression) which the compiler flags as agent-specific.

Feasibility verdict: BUILDABLE — a config generator over the file-45 matrix; the friction is tracking each agent's config drift (Copilot's billing flip, Cursor's .cursorrules deprecation, etc.).

Tier: T1 (each target's config surface, file 45). Quality risk: NEUTRAL (config translation). Effort: days (and ongoing maintenance as agents churn).


Honest ceiling

These eight are deployable-but-unbuilt, not megaleverage. The biggest dollar swings remain where Volume I left them — blocked behind the hosted API (soft-prompts, KV export: K1/K2/file 46) — and the biggest quota swing (V1/V2) cannot be sized until the denominator is probed. Volume II's frontier changes which choice is correct (route vision cheap, prefer text over pixels, buy speed only when a human waits, guard compression live) and what is measurable (the cap weight, the guard tax) more than it raises the dollar-reduction ceiling. The composed effect on the tier list and the 10x verdict is settled in 49.

On this page