jackin'
ResearchToken-optimization tools

11 — Extended comparison axes

11 — Extended comparison axes

Pages 01–07 compared the four tools on what they do, how they work, and what they save. This page adds the six axes the earlier passes never set side by side — the ones surfaced by the gap analysis: security/privacy, project health, interaction with Claude Code's own context management, build-vs-buy, subscriber economics, and non-coding behavior. Repo stats are a fresh gh api pull (2026-06-20); external claims carry sources; reasoned (not measured) calls are labeled.

(a) Security, privacy, and supply-chain

The tools sit on a surface-area gradient that is roughly the inverse of their reach gradient — except lean-ctx, which is broad-reach yet local-first-by-default, so its surface depends heavily on which opt-in features you enable.

ConcernCavemanHeadroomRTKlean-ctx
Data egressNone — only changes what the model writesHighest — proxy mode sees every byte of every request; library/MCP modes local-onlyNone by default (filters locally); opt-in anonymized telemetryNone by default (local-first, no telemetry by default); opt-in cloud sync + opt-in proxy add egress paths
Supply-chainnpm package + deps; writes hooks into ~/.claudeAuto-downloads kompress-base ML model over TLS (unpinned = integrity boundary); pip/npm/dockerSingle ~4.1 MB binary, zero runtime deps (smallest) + a hook write64.7 MB binary (largest); opt-in embeddings model download + opt-in qdrant; writes hooks/skills across up to 34 agents + daemon autostart
Code-execution / injectionNone (it is a prompt)Runs a local model + proxy processRuns subprocesses — quoting/shlex mattersRuns the most — shell-hook subprocesses + LSP subprocesses + daemon + optional proxy; SSRF-guarded URL reads (http/https only)
Known security issuecaveman-shrink broken MCP registration — availability, not securityCompressionAttack (arXiv 2510.22963, ≤80% ASR) targets an ML compressor in the request pathIssue #886 reportedly bypasses Claude Code permission prompts (not proven; single report)Self-disclosed "40+ hardening fixes" (v3.5.16) — path traversal, injection, CSPRNG, CSP, resource limits; path-jail + redaction shipped (remediation history, not an open CVE)
Model-integrity attack surfaceNoneYes — the ML stage is the attack surfaceNone (deterministic rules)None by default (deterministic core); only if embeddings/proxy enabled
LicenseMITApache-2.0Apache-2.0Apache-2.0

Verdict. Caveman has the smallest surface (a prompt). RTK is next (deterministic, local, tiny binary — but a hook write and the #886 permission concern). lean-ctx and headroom are the two large surfaces, for different reasons: headroom by egress (an ML proxy that sees everything) and a mandatory model download; lean-ctx by footprint (the biggest binary, the most subprocesses, host writes across dozens of agents) — but lean-ctx is local-first by default with no telemetry and a deterministic core, so its egress/ML risks are opt-in, whereas headroom's are on by default in proxy mode. For a security-conscious container: caveman freely; RTK with #886 checked and telemetry off; lean-ctx only in MCP + shell-hook mode (no proxy, no cloud sync), version-pinned, with all host writes scoped into the container; headroom only in MCP/library mode with the model pinned.

(b) Project health and sustainability

Stars are PR-inflated for three of the four and ignored here. The real signals (gh api, 2026-06-20):

SignalCavemanHeadroomRTKlean-ctx
Stars (noise)74,49533,87163,6432,800
Watchers (truer)16611514619
Forks4,1902,2873,918278
Open issues2933031,26013
LicenseMITApache-2.0Apache-2.0Apache-2.0
Latest releasev1.9.0v0.26.0v0.42.4v3.8.9
Created20252026-052026-01-222026-03-23 (youngest)
Cadencesteady~190 PyPI releases212 tags in ~5 months"200+ releases" / fast
Maintainer modelsolo (JuliusBrussee)solo (chopratejas)small team + commercial (RTK Cloud)small + commercial (LeanCTX Cloud)
Stated sustainabilitynonenone statedRTK Cloud ($15/dev/mo, waitlist)LeanCTX Cloud shipped (Pro $9/mo, Team $18/seat); "local free is a CI-enforced invariant"

Verdict. None is a mature, multi-maintainer project; all four are fast-moving solo/small efforts riding a 2026 hype spike. RTK's 1,260 open issues signal adoption outrunning triage (or thin maintenance); lean-ctx's 13 open issues signal the opposite — either tight triage or simply far less adoption (2,800★ vs 30–74k). Two have funded paths (RTK Cloud, LeanCTX Cloud); lean-ctx's open-core is the cleaner of the two — the local engine is Apache-2.0 and its "free forever" is stated as a CI-enforced invariant, whereas RTK Cloud has the classic risk of features migrating behind the paywall. But lean-ctx is also the youngest (created 2026-03) with the broadest surface to maintain, so its bus-factor risk is real. For a container that pins versions, pin a known-good version of any of the four and re-verify on upgrade — most acutely for lean-ctx, whose fast cadence and large surface make doc drift and regressions likeliest. (Reasoned from repo metrics; not a measured reliability study.)

(c) Interaction with Claude Code's native context management

Claude Code already ships context-management features that touch the same tokens these tools target. Do the tools conflict or compose?

Native featureWhat it doesCavemanHeadroomRTKlean-ctx
microcompactno-LLM trimming of redundant tool output every turnCompose (different bucket)Overlap (diminishing returns on same bytes)Overlap (RTK trims upstream, microcompact in-context)Overlap — its hook + bounce tracker + pressure auto-downgrade do similar trimming; compose with diminishing returns
/compact + auto-compactionLLM history rewrite (cache-write spike)ComposeConflict risk — proxy IntelligentContext can double-compactCompose (tool boundary, upstream)Compose + complement in MCP/hook (its session-survival snapshot aids recovery after compaction); conflict risk only in proxy mode
context editing (server-side stale-result clearing)clears old tool results near the limitComposeOverlap with IntelligentContextComposeOverlap (its own eviction/CFT ledger does similar) — compose in MCP/hook
automatic prompt cachingthe 0.1× read floorCompose (cache-neutral)Conflict risk in proxy mode; safe in MCP/libraryCompose (cache-safe by construction)Compose in MCP/hook (write-time + prefix-friendly ordering cooperates with caching); proxy is cache-safe-by-design but still a second stabilizer

Verdict. Caveman composes cleanly with everything (output-side). RTK composes (upstream of all of them). lean-ctx composes in MCP + shell-hook mode — and uniquely complements compaction with its session-survival snapshot — but its proxy, like headroom's, is a second stabilizer that can fight Claude Code's own caching/compaction. Headroom's proxy is the most conflict-prone. The repeated lesson: for both runtimes, use MCP/hook and avoid the whole-prompt proxy in front of Claude Code. And: microcompact already does, for free, a slice of what all three input tools charge reach/ML/footprint for — measure the incremental win over native microcompact, not over a naive baseline.

(d) Build-vs-buy: the spectrum from a 5-line style to a daemon runtime

Mechanically, caveman is a Claude Code output-style plus two hooks — so hand-rolling its core is a ~5-line file. lean-ctx is the opposite extreme: you cannot hand-roll a property graph + RRF search + LSP + CCP memory. The four span the whole build-vs-buy spectrum.

Hand-written output-styleCaveman pluginRTKlean-ctx
Can you hand-roll the core?it is the hand-rollyes (it is a style + hooks)partially (a log/grep filter hook)no — code graph, RRF, LSP, CFT are not a weekend script
Footprintnone~940-tok rent + 2 hooks~4 MB binary + 1 hook64.7 MB binary + daemon + DBs + 77-tool schema
What "buy" gets youthe family + UX100+ command patterns turnkeythe entire context runtime + code intelligence + verification

Verdict. For pure output compression, a hand-written output-style is leaner and lower-risk than the caveman plugin (no hooks, no ~940-tok rent) — buy the plugin only for the ecosystem/UX. For shell compression, RTK's "buy" is justified by its 100+-command coverage over a hand-written filter. For the code graph + memory + verification, lean-ctx is the only "buy" available — those capabilities are genuinely not hand-rollable, which is the strongest single argument for adopting it if you need them. The spectrum: build the output style, buy RTK's patterns, buy lean-ctx's runtime only when you need what cannot be built cheaply. (Reasoned from the teardowns; token deltas in the harness backlog.)

(e) Subscriber economics — tasks-per-cap ranking

For a Max-plan subscriber, dollars below the cap are sunk; the objective is tasks-per-cap — how far each tool stretches the 5-hour window (dossier chapter 41). This re-orders the $-per-task recommendation.

  • The window fills with input volume — and page 10 measured 94% of token volume here is cache-read. Reducing what gets written (and thus later re-read) extends the window; reducing output (caveman's target, 0.9% of volume) barely moves it.
  • So for window extension: headroom ≈ lean-ctx (broad input) ≳ RTK (Bash input) ≫ caveman (output) — the inverse of the $-per-task lean stack. lean-ctx's ~13-token cache-handle re-reads directly cut re-read volume, the dominant occupancy cost, so it is as strong a window-extender as headroom (and stronger on code-read-heavy work). The community "30 min → 3 hr session" headline is a tasks-per-cap/occupancy win, not a dollar cut, and it is driven by the input tools, not caveman.
  • Caveat (not proven): the cap's token denominator and exact cache-read weighting are unpublished (dossier chapter 41, bounded INCOMPLETE); cap cache-read weight is community-triangulated at ≈0.1× (T3). The direction (input tools extend the window most) is solid; the magnitude is not.
ObjectiveBestMiddleLeast
$ per task (API pricing)caveman + RTK (cache-safe, no ML, tiny)headroom / lean-ctx (MCP)
tasks per cap (Max subscriber)headroom ≈ lean-ctx (broad input)RTK (Bash input)caveman (output)

The metric you optimize flips the ranking. State which one you are on before picking a tool.

(f) Non-coding / multi-domain behavior

All four are benchmarked on coding; their behavior off the code path is uneven, and it inverts the usual ranking.

DomainCavemanHeadroomRTKlean-ctx
Prose / docs / chat outputBest fit — register compression is prose compressionn/a (input side)n/an/a (input side)
Data / JSON / API / RAG / HTMLn/aStrong — typed compressors are general, not code-specificWeak — keyed on dev commandsWeak — measured JSON 30.6%, Markdown 7.5%, HTML 6.8% (its outline modes are code-specific)
Logs / sysadmin / data-pipeline shell outputn/aStrong (LogCompressor)Good but dev-keyedGood but dev-keyed (56 patterns are dev tools; novel commands pass through)
Pure codePasses verbatim (no help)CodeAware outlineAggressive code filter (per-read)Best — tree-sitter outline 96–99% (per-read) + code graph

Verdict. Caveman generalizes best (its lever is prose, useful anywhere output is verbose). Headroom is second (typed compressors cover logs/JSON/HTML/RAG, not just code). RTK and lean-ctx are the most code/dev-bound — RTK because it keys on dev commands, lean-ctx because its compression strength is code (it barely touches prose/config, as measured on page 10). So for a non-coding agent (research, data, ops) the ranking is caveman ≳ headroom ≫ RTK ≈ lean-ctx; lean-ctx's edge (the code graph) is precisely a code feature, which is no help off the code path. (Reasoned + the page-10 measurement; no broader non-code benchmark run.)

Summary: how the rankings move by axis

AxisWinnerNotable
Smallest security/privacy surfacecavemanRTK next; headroom (ML+proxy egress) and lean-ctx (footprint) largest — but lean-ctx is local-first by default
Smallest / largest footprintRTK smallest / lean-ctx largestRTK ~4 MB one binary; lean-ctx 64.7 MB + daemon + DBs
Sustainability pathlean-ctx / RTK (both funded)lean-ctx's open-core is cleaner (CI-enforced free local); both young
Composes with native Claude Code featurescaveman / RTK / lean-ctx (MCP+hook)both proxies (headroom, lean-ctx) conflict
Build-vs-buybuild the output style; buy lean-ctx's code graph (unbuildable)RTK's patterns are a justified middle "buy"
tasks-per-cap (subscriber)headroom ≈ lean-ctx ≳ RTK ≫ cavemaninverts the $-per-task order
non-coding generalitycaveman ≳ headroom ≫ RTK ≈ lean-ctxlean-ctx is the most code-bound (its strength is code)

No tool wins every axis — the hub's thesis restated from six new angles: they specialize (and lean-ctx consolidates, at a footprint cost), and the "best" one is the one matched to your axis (workload, metric, threat model). The overview and combining pages turn that into a stack-or-runtime decision.


Back to the overview · gaps & open questions.

On this page