# 54 — Context-compression literature and market delta (2024–2026) (https://jackin.tailrocks.com/research/token-optimization/54-context-compression-literature-and-market/)



# 54 — Context-compression literature and market delta (2024–2026) [#54--context-compression-literature-and-market-delta-20242026]

Companion to the headroom deep-dive (53): the internet re-sweep for context-compression projects and academic work the dossier's 03/46/51/52 do not already cover, run to answer "what else is out there, and did we miss anything?" Two parallel research efforts fed this file — a market sweep of shipping tools and a clean-room academic literature pass across eight threads — each independently cross-checked against primary sources. Research conducted 2026-06-15; every external claim carries a source in the ledger; tiers per the dossier scheme (T1 shipped+reproduced / T2 peer-reviewed / T3 community-replicated / T4 speculative). This is the compression-layer analogue of file 46's fresh-literature role.

<Aside type="note">
  For the focused tool comparison (equal-depth design teardowns of caveman, headroom, RTK, and **lean-ctx**, a feature has/lacks matrix, combinability), see the dedicated folder [token-optimization tools](/research/token-optimization-tools/). This chapter remains the broad compression-market and literature sweep that comparison draws on.
</Aside>

## TL;DR [#tldr]

* **The single most useful synthesis is the cache-safety axis (§C): output brevity is cache-neutral; input compression is cache-breaking unless done at write-time on new content.*&#x2A; Everything else sorts under this. On a hosted, already-caching Claude, a compressor that recompresses the prompt prefix must beat &#x2A;*\~10× (pure prefix) / \~5.5× (mixed)** just to break even against the 0.1× cache-read price — confirmed by a 358-run Claude Sonnet 4.5 RCT where aggressive compression *raised* cost 1.8% (arXiv 2603.23525) and by "Don't Break the Cache" (arXiv 2601.06007).
* **The frontier moved to code-domain, hosted-viable, write-time compression that *improves* SWE-bench — refuting file 46 D's "no new compressor safe for code."** Squeez (arXiv 2604.04979: 92% tool-output removal at 0.86 recall, write-time Unix pipe, cache-safe), AgentDiet (arXiv 2509.23586: Claude 4 Sonnet &#x2A;*64.5%→66.5%**, input −40–60%, and the only paper that nets out its own +5–15% compressor cost), SWEzze/OCD (arXiv 2603.28119: +5–9.2% resolution at \~6×), SWE-Pruner (Claude Sonnet 4.5 &#x2A;*70.6%→72.0%**), LongCodeZip (ASE 2025: training-free, 5.6×, no loss).
* **Stars are noise in this niche.** A GitHub search found 63 repos created since Dec 2025 with >25k stars; the flagship "compression" repos show abnormal star:watcher ratios (RTK 62,461★/146 watchers = 428:1; CodeGraph 49,439/115; headroom 28k/95) on 3–5-month-old repos. Rank by **evidence, forks, and open-issue activity**, never stars.
* **Most "compression" is selective inclusion, not payload compression.** The flashiest multipliers (82×–528×, 120×, 99.1%) come from querying an index instead of reading whole files against a worst-case grep baseline — real, but the file-51/52 lever, not LLMLingua-style ratios; and several headline "98%" figures are **bytes, not tokens**.
* **The peer-reviewed standouts are contrarian and code-relevant.** the-complexity-trap (NeurIPS'25, JetBrains): cheap **observation masking matches LLM summarization at half the cost** on SWE-bench — and implicitly debunks the summarize-history compaction most agents ship. ACON (arXiv 2510.00615): 26–54% peak-token at >95% accuracy, gradient-free for hosted APIs. OpenHands batched condensation: the only widely-shipped **cache-aware** compaction.
* **Independent headroom measurements land at \~47–50%** (Miya-Gadget 47.5%, HN \~50%), below the 60–95% headline (full treatment in 53).
* **Soft-prompt / embedding compression (gist/ICAE/500x/xRAG/PISCO/Cartridges) is confirmed a dead end on hosted Claude** — every method needs embedding-input or KV-write access no hosted API exposes. Only Selective Context (a hard-prompt method) reaches a hosted API, at \~2×.

## A. Market re-sweep — context-compression tools the dossier had not cataloged [#a-market-re-sweep--context-compression-tools-the-dossier-had-not-cataloged]

Excludes everything already in 03/46/51/52 (caveman family, fff, claude-mem, aider map, LiteLLM, ccusage, TOON, LLMLingua, codedb, CodeGraff, Serena, the named context engines, vector DBs, KV-eviction families, CAG, Mem0). All numbers vendor self-reported unless tagged otherwise; none reproduced locally.

### Five skeptical filters (apply to every row) [#five-skeptical-filters-apply-to-every-row]

1. **Stars are a PR artifact** (see TL;DR) — use forks/issues/watchers and evidence instead.
2. **Selective inclusion ≠ payload compression.** "Query an index" multipliers are the file-51 lever against a worst-case baseline; do not compare them to fixed-prompt compression ratios.
3. **Evidence is almost all vendor self-report.** Peer-reviewed exceptions are rare and mostly NL-domain.
4. **NL→code transfer is unproven** for most compressors (benchmarked on NQ/TriviaQA/HotpotQA, not SWE-bench).
5. **Cache breakage is the silent cost-killer.** History-rewriting compaction/proxies can destroy the 10× cache-read discount they nominally beat (Anthropic docs: "Tools that drop or summarize older messages… invalidate cache hits"). Only a handful engineer around it.

### Condensed market table (load-bearing rows) [#condensed-market-table-load-bearing-rows]

| Tool                                         | Category               | Mechanism                                                                     | Claimed saving (verbatim)                                          | Tier                                                                                      | Cache-safe?                                         | License                   |
| -------------------------------------------- | ---------------------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------- | --------------------------------------------------- | ------------------------- |
| **llmtrim** (fkiene/llmtrim)                 | Proxy                  | Rust proxy; signature-skeletonize, dedupe, JSON re-encode                     | "−31% input / −74% output"; "round-trip −66%"; quality 78.9%→82.2% | T3 (live A/B, input only)                                                                 | **Yes** ("without ever touching the cached prefix") | AGPL-3.0                  |
| **entroly** (juyterman1000/entroly)          | Repo-context selection | BM25 + dep-graph + knapsack under budget; "Cache Aligner" byte-stable prefix  | "70–95%"; "8K budget: 99.1%"                                       | T3                                                                                        | Claimed prefix-stable                               | Apache-2.0                |
| **TokenTamer** (borhen68/TokenTamer)         | Proxy                  | Drop-in; injects `cache_control` *before* compressing stale tool reads        | "50–80%"; "\~73% off input"                                        | T4 (self-admits "synthetic payloads")                                                     | Cache-first design                                  | MIT                       |
| **claw-compactor** (open-compress)           | Library                | 14-stage, **zero-LLM**, reversible RewindStore, AST-aware, KV-cache alignment | weighted avg &#x2A;*36.3%**; JSON 81.9%                            | T3 (ROUGE-L vs LLMLingua-2: 0.653 vs 0.346)                                               | Claimed aligned                                     | MIT                       |
| **leanctx** (jia-gao/leanctx)                | Library                | Wraps LLMLingua-2 locally + summarizer; routes code/errors verbatim           | "removes 57%… doubles baseline accuracy" (15-item NL subset)       | T3                                                                                        | Composes with `cache_control`                       | MIT                       |
| **RTK** (rtk-ai/rtk)                         | Tool-output CLI/hook   | Deterministic Rust proxy + PreToolUse hook, no LLM                            | "60–90%"; session −80% (labeled "Estimates")                       | T4 — &#x2A;*own Issue #582: hook *raises* CC cost 18%**; #886 bypasses permission prompts | Hook rewrites output → **risk**                     | Apache-2.0                |
| **context-mode** (mksglu/context-mode)       | Tool-output sandbox    | `ctx_execute` → raw output to SQLite FTS5, only stdout enters context         | "98%" (315 KB→5.4 KB — **bytes**)                                  | T3 (21-scenario BENCHMARK.md)                                                             | Sandbox sidesteps cache                             | **Elastic 2.0 (not OSI)** |
| **chop** (AgusRdz/chop)                      | Tool-output hook       | PreToolUse deterministic pattern-match, 60+ cmds                              | "git status 95%; npm test 99%"                                     | T4 (per-cmd)                                                                              | Hook on new output                                  | MIT                       |
| **mcp-compressor** (atlassian-labs)          | MCP schema             | Proxy compresses tool **definitions** (not results)                           | (no headline)                                                      | T3 (Atlassian Labs)                                                                       | Schema-side                                         | Apache-2.0                |
| **flightlesstux/prompt-caching**             | Cache-maximizer        | Auto-inserts Anthropic `cache_control` on stable prefixes                     | "up to 90% on repeated reads"                                      | T3 (matches 0.1× cache price)                                                             | **Helps caching**                                   | MIT                       |
| **OpenHands Condenser**                      | Compaction harness     | LLM summary of old events, **batched** to preserve cache                      | "up to 2× per-turn cost"; SWE-bench 54% vs 53%                     | **T1 (SWE-bench)**                                                                        | **Yes — batched**                                   | MIT                       |
| **the-complexity-trap** (JetBrains-Research) | Compaction study       | **Observation masking** vs LLM summarization, SWE agents                      | masking "halves cost… matching solve rate"; hybrid −7–11%          | **T2 (NeurIPS'25, arXiv 2508.21433)**                                                     | Model-agnostic                                      | MIT                       |
| **ACON** (microsoft/acon)                    | Observation+history    | Gradient-free; optimizes the compression prompt in NL space                   | "26–54% peak token"; ">95% accuracy when distilled"                | **T2 (arXiv 2510.00615)**                                                                 | Side-call (preservable)                             | MIT                       |
| **Zep / Graphiti** (getzep)                  | Memory graph           | Temporal knowledge graph                                                      | **"17% smaller" (5,760 vs 6,956); "35% smaller"**                  | T3 + arXiv 2501.13956                                                                     | Append-style                                        | Apache-2.0                |
| **GPTCache** (zilliztech)                    | Semantic cache         | Embedding-similarity response cache                                           | (hit-ratio, no headline)                                           | **T2 (ACL'23)** but &#x2A;*stale (2025-07)**                                              | Provider-agnostic                                   | MIT                       |

(The fuller per-category catalog — proxies, libraries, tool-output hooks, code-index "selective inclusion" tools, semantic caches, compaction harnesses, memory systems, and RAG-chunk compressors — and every star/fork figure with its watcher-ratio flag is preserved in the source ledger below; the rows above are the ones that move a decision.)

### Credible challengers ranked by evidence (not stars) [#credible-challengers-ranked-by-evidence-not-stars]

1. **the-complexity-trap** (JetBrains-Research, NeurIPS'25, MIT) — the most credible artifact in the sweep: peer-reviewed, *code-specific* (SWE-bench Verified), and contrarian — cheap **observation masking matches expensive LLM summarization at \~half the cost**. A method, not a drop-in, but it is the only rigorous code evidence here and it debunks the summarize-history compaction that Cline/Goose/Hermes ship.
2. **OpenHands Condenser** (MIT) — best-evidenced *shipping* compaction with a real SWE-bench number and the only harness **cache-aware by design** (batched condensation).
3. **ACON** (Microsoft, peer-reviewed) — the most conservative, therefore most believable, agent-context number: 26–54% peak-token at >95% accuracy; gradient-free so it runs against hosted APIs.
4. **llmtrim** (AGPL) — the most caching-honest proxy: explicitly compresses "without touching the cached prefix" and discloses its measurement limits. A more transparent, narrower headroom competitor.
5. **claw-compactor** (MIT) — strongest library: zero-LLM, deterministic, &#x2A;*reversible (RewindStore = an open, inspectable CCR)**, AST-aware that never renames identifiers, honest 36.3% with an apples-to-apples ROUGE-L baseline vs LLMLingua-2.
6. **flightlesstux/prompt-caching** (MIT) — orthogonal and the one tool whose savings are *provably real* because it only uses Anthropic's documented `cache_control`; maximizes caching instead of risking it.

**Cautionary tales (high hype, weak substance):** RTK (62k stars, but its own issue tracker says the hook raises cost 18% and bypasses Claude Code's permission prompts — output-rewriting hooks are cache-unsafe); the "82×–528× / 120× / 99.1%" code-index tools (selective inclusion vs worst-case grep, not compression — the file-51 lever); and any soft-token RAG method (xRAG/REFRAG/PISCO/OSCAR), which cannot run against the Anthropic API at all (§B thread 1).

## B. Fresh literature (2024–2026), by thread [#b-fresh-literature-20242026-by-thread]

The meta-finding: the field moved from NL-QA soft-prompt compression (a dead end for hosted APIs) to code-domain, agent-trajectory, hosted-compatible compression evaluated on SWE-bench — and several of these *improve* accuracy while cutting tokens.

* **Thread 1 — Soft-prompt / embedding compression: dead for hosted Claude.** Gist (26×), ICAE (4×), 500xCompressor (6–480×, 62–73% retention), xRAG (1 token/doc), AutoCompressors, COCOM, PISCO (16× @ 0–3% loss), Cartridges (38.6×), LLoCO (30×) — all compress into embeddings/KV the model must be trained to read; surveys say plainly they are "impractical for API-based proprietary LM services" (arXiv 2410.12388). &#x2A;*Correction to the dossier's K1:** 500xCompressor writes per-layer KV — the "works without fine-tuning" framing is misleading for hosted use. Only **Selective Context** (a hard-prompt method, \~2×) reaches a hosted API.
* **Thread 2 — AST/code-aware compression: the new, hosted-viable, accuracy-positive frontier.** **The Perplexity Paradox** (arXiv 2602.15843): LLMLingua on code causes "Function Identity Collapse" — 86.1% of failures are NameError — fixed by deterministic **signature injection** (+34pp; NameError 86.1%→6.1%), which is exactly headroom's CodeAwareCompressor design. **SWEzze/OCD** (arXiv 2603.28119): AST-aware, \~6×, &#x2A;*SWE-bench resolution +5.0–9.2%**. **LongCodeZip** (ASE 2025): training-free, model-agnostic, **5.6× without loss** on code. **CodeMEM** (arXiv 2601.02868): AST memory, no fine-tuning, +12.2% instruction accuracy. **REPOFUSE**: signature-only "rationale context." All run as preprocessing on hosted Claude but are query-conditional → cache-breaking per instance (so: compress at ingestion, not in the prefix).
* **Thread 3 — Tool-output / observation compression: the richest vein, several measured on Claude.** **AgentDiet** (arXiv 2509.23586): the canonical "separate cheap summarizer" pattern and the **only** one that accounts for its own cost (+5.2–14.8%) — net Claude 4 Sonnet &#x2A;*64.5%→66.5%**, input −39.9–59.7%. **ACON** (arXiv 2510.00615): gradient-free, hosted-API-compatible, peak-token −26–54.5%. **M2** (arXiv 2603.00503): training-free, &#x2A;*Claude-3.7 +12.5%/−30.3% tok, Claude-4 +5.5%/−39.1%**. **CaT/SWE-Compressor** (arXiv 2512.22087): SWE-bench 49.8%→57.6%. First-party: Claude API automatic compaction (beta, default 150k-token trigger; **invalidates the cached prefix** unless you breakpoint the system-prompt end + the compaction block; iteration tokens billed but excluded from top-level usage) and Claude Code's **microcompact** (no-LLM redundant-tool-output trimming every turn — the cheap, always-cache-safe layer).
* **Thread 4 — Reversible / retrievable compression: paging composes with caching; context editing breaks it.** MemGPT/Letta (DMR 93.4% vs 35.3% summary baseline — but a full-context baseline hit 94.4%, so paging buys little when everything fits), Anthropic memory tool + just-in-time context (+39%/84% on a 100-turn web eval, append-style, cache-friendly). The exception: **Anthropic context editing explicitly invalidates the cached prefix on each clear** (`clear_at_least_input_tokens` exists precisely to make the re-cache worth it). Headroom CCR and claw-compactor's RewindStore are the buildable reversible+cache-aware tools.
* **Thread 5 — Learned-importance selection.** **Squeez** (arXiv 2604.04979): the standout — task-conditioned tool-output pruning, **92% removal at 0.86 recall**, runs as a write-time Unix pipe (`pytest | squeez 'find the failure'`), **cache-safe**, code-domain (BM25 only 0.22 recall on tool output — retrieval fails on logs). **SWE-Pruner** (GitHub Ayanami1314/swe-pruner; arXiv ID tentative): 0.6B skimmer, &#x2A;*Claude Sonnet 4.5 70.6%→72.0%**, tokens −23–38%, but mid-stream → cache-breaking. **Provence** (ICLR'25): strong but NL-QA only, never code-evaluated.
* **Thread 6 — Context rot: the accuracy argument for compression.** Chroma's "Context Rot" (2025-07, **18 models incl. Claude Opus 4 / Sonnet 4 / 3.7 / 3.5 / Haiku 3.5**): reliability falls as input grows *even on trivial retrieval*, with Claude Opus 4 / Sonnet 4 showing the most pronounced gap (they abstain under ambiguity); all 18 did better on a shuffled than a logically-ordered haystack. Plus "Lost in the Middle" (U-shaped, >30% drop) and RULER (17/17 long-context models degrade). **ACE** (ICLR'26): evolving playbooks with incremental updates avoid "context collapse," +10.6% on agents, token −83.6% — but base model is DeepSeek, not Claude. Cognition's "Don't Build Multi-Agents" is the counterpoint for coding: prefer a single linear agent + a dedicated compression LLM.
* **Thread 7 — THE CRUX (input compression vs caching).** No hosted-API scheme makes per-turn input compression cache-compatible. Break-even ≈ **10× pure-prefix / 5.5× mixed**. Hard evidence: "Don't Break the Cache" (arXiv 2601.06007: caching saves 41–80%, put dynamic content last) and the 358-run Claude 4.5 RCT (arXiv 2603.23525: aggressive compression *raised* cost 1.8%). Every cache-*aware* compressor (CacheBlend EuroSys'25 Best Paper; KVzip NeurIPS'25) is **self-host-only**. The only hosted-Claude-safe input moves are (a) offline compress-once → cache (the CAG pattern, file 46 FL1) and (b) write-time observation compression before content enters the cached prefix.
* **Thread 8 — Output brevity vs input compression.** Output is billed \~5× input and is **cache-neutral**, so brevity is strictly the safer lever. **Chain of Draft*&#x2A; (arXiv 2502.18600, tested on Claude 3.5 Sonnet): sports 92.4% output cut **+4.1pp**, GSM8K \~79% cut −4.4pp, latency −48%. **TALE** (ACL'25): 68.9% at \<5% loss. **Token Complexity** (arXiv 2503.01141): the guardrail — every task has a minimal-token floor below which accuracy collapses. **Brevity-hierarchy** (arXiv 2604.00025): verified to exist (file 03 flagged it unverified) but a trap — solo/unreviewed, +26.3pp is a cherry-picked 7.7% subset, no Claude, no code; do not propagate "+26 points."

## C. The unifying synthesis — a cache-safety classification of every compression move [#c-the-unifying-synthesis--a-cache-safety-classification-of-every-compression-move]

This is the table the dossier was missing: sort compression by *where* it acts, and cache-safety falls out deterministically. It is the practical decision rule for a hosted-Claude operator.

| Class                                           | Examples                                                                                       | Acts on                               | Cache interaction                                     | Hosted-Claude verdict                                                                                          |
| ----------------------------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------- | ----------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- |
| **Output brevity**                              | caveman, Chain-of-Draft, TALE, effort tiering                                                  | Generated tokens                      | **Neutral** (never touches the prefix)                | **Adopt first** — unconditionally cache-safe, hits the 5×-priced class. Bounded by the token-complexity floor. |
| **Write-time observation compression**          | Squeez, AgentDiet, microcompact, headroom MCP mode, chop, the-complexity-trap masking          | New tool outputs *before* first cache | **Safe** (shrinks the write that was going to happen) | **Adopt second** — the only cache-safe input lever; evidence now shows it can *raise* SWE-bench accuracy.      |
| **Offline compress-once → cache**               | CAG pattern (FL1), a curated repo/spec prefix                                                  | A stable corpus, once                 | **Composes with caching**                             | **Adopt where the corpus fits context** and is queried many times.                                             |
| **Cache-maximization**                          | flightlesstux/prompt-caching, CacheAligner (prefix-hygiene part), `/cd`                        | The prefix itself                     | **Helps caching**                                     | **Adopt** — provably real (documented API feature).                                                            |
| **Reversible paging / memory**                  | MemGPT/Letta, Anthropic memory tool, CCR, RewindStore                                          | Out-of-context store + retrieve       | **Append-style: safe**                                | **Adopt** for cross-session state; net-accounting still unpublished.                                           |
| **Whole-prompt / mid-stream input compression** | LLMLingua proxies, headroom proxy mode, SWE-Pruner, history-rewriting compaction (Cline/Goose) | The cached prefix, per request        | **Breaks the cache** (must beat \~10×)                | **Avoid on hosted Claude** unless it provably beats break-even and preserves `cache_read`.                     |
| **Soft-prompt / embedding / KV compression**    | gist, ICAE, 500x, xRAG, PISCO, Cartridges, CacheBlend, KVzip, SnapKV family                    | Model internals                       | n/a (no API channel)                                  | **$0 — self-host only** (confirms file 46's wall).                                                             |

## D. Corrections and refinements to the dossier [#d-corrections-and-refinements-to-the-dossier]

* **New caveat — stars are not adoption in the compression niche.** The dossier's market files (03/51/52) cite star counts as an adoption signal; in this sub-field that is now invalid (PR-spike inflation, abnormal watcher ratios). Rank by evidence/forks/issues. Applied retroactively to file 53's headroom/caveman star figures.
* **Refute file 46 D ("no new lossy compressor both user-reachable on hosted Claude and safe for code").** Superseded by Squeez, AgentDiet, SWEzze, SWE-Pruner, LongCodeZip — code-domain, hosted-viable, accuracy-neutral-or-positive — provided they act at write-time/ingestion, not on the cached prefix. The Perplexity Paradox supplies the mechanism (signature injection beats perplexity pruning on code).
* **Sharpen the break-even (record 19 / file 46 FL3).*&#x2A; State it as **\~10× on a fully-cacheable prefix, \~5.5× on a mixed prompt**, with the cacheable-fraction assumption explicit — the single most attackable number in the dossier's chain, now backed by a Claude-specific RCT.
* **Correct the caveman brevity citation (file 03 record 01 / K1).** arXiv 2604.00025 is real but NL-only, unreviewed, cherry-picked; the defensible "brevity helps accuracy" evidence is Chain-of-Draft on Claude 3.5 Sonnet (+4.1pp at −92% output), a modest effect consistent with the dossier's existing read.
* **Correct the soft-prompt framing (K1).** 500xCompressor/ICAE/xRAG are not hosted-usable "without fine-tuning" — they need embedding-input or per-layer-KV channels; the family is self-host-only.
* **Promote output brevity above input compression** in the recommendation hierarchy on cache-neutrality grounds (§C).

## E. jackin' implications [#e-jackin-implications]

The compression layer sorts cleanly onto the cache-safety classification (§C), and the jackin' recommendation follows from it:

1. **Bake the cache-safe set, in order:** output brevity (already shipped via the caveman family) → write-time observation compression (pilot headroom MCP mode or a hook equivalent like the-complexity-trap masking / chop / Squeez-style filtering) → native cache-maximization (`/cd`, stable prefixes, `cache_control` discipline) → offline compress-once for stable corpora (the CAG pattern).
2. **Do not put a whole-prompt compressor in a jackin' container's hot path** — it is a cache-bust risk, a CompressionAttack surface (file 46 FL3), and on already-caching Claude Code it usually loses (the RTK issue tracker is the worked cautionary tale: a hook that *raised* cost 18%).
3. **Prefer peer-reviewed, code-domain, cache-aware tools** when piloting: the-complexity-trap (observation masking), OpenHands-style batched condensation, ACON. Rank candidates by evidence, never by stars.
4. **Reuse the file-51 validation harness** with cache-read continuity as a first-class metric (already added in 53's harness): accept a compressor only if it beats the existing hooks+code-intel arm on tokens-per-solved-task at equal quality *and* preserves `cache_read`.
5. **Memory layer: pick one** reversible, cross-agent store if needed (headroom memory, Letta, or Zep/Graphiti — the only one with paired before/after token counts), and demand net-accounting before trusting it.

## Verification ledger [#verification-ledger]

All accessed 2026-06-15. Independent measurements and peer-reviewed work flagged inline; everything else is vendor self-report.

**Independent headroom measurements:** Miya-Gadget benchmark (47.5%, RAG 0%) — miyagadget.page/en/blog/2026/06/03/headroom-ai-context-compression-benchmark-en (via reader proxy; site 403s direct) · HN "\~50%" — news.ycombinator.com/item?id=46663757 · negative evidence (HN story index has zero hits for the tool) — hn.algolia.com/api/v1/search?query=headroom\&tags=story · press churn origin — theregister.com/ai-ml/2026/05/31/netflix-wiz-creates-app-to-slash-ai-bills-then-open-sources-it

**Market tools:** github.com/fkiene/llmtrim · github.com/juyterman1000/entroly · github.com/borhen68/TokenTamer · github.com/open-compress/claw-compactor · github.com/jia-gao/leanctx · github.com/rtk-ai/rtk (Issue #582, #886) · github.com/mksglu/context-mode · github.com/AgusRdz/chop · github.com/atlassian-labs/mcp-compressor · github.com/flightlesstux/prompt-caching · github.com/OpenHands/OpenHands · github.com/microsoft/acon · github.com/JetBrains-Research/the-complexity-trap · github.com/getzep/graphiti · github.com/zilliztech/GPTCache · github.com/Compresr-ai/Context-Gateway · github.com/colbymchenry/codegraph · github.com/tirth8205/code-review-graph · github.com/DeusData/codebase-memory-mcp · github.com/letta-ai/letta · github.com/firecrawl/firecrawl-mcp-server

**Literature:** Squeez arXiv 2604.04979 · AgentDiet arXiv 2509.23586 · ACON arXiv 2510.00615 · the-complexity-trap arXiv 2508.21433 (NeurIPS'25 DL4Code) · Perplexity Paradox arXiv 2602.15843 · SWEzze/OCD arXiv 2603.28119 · LongCodeZip ASE 2025 (arXiv 2510.00446) · M2 arXiv 2603.00503 · CaT arXiv 2512.22087 · CodeMEM arXiv 2601.02868 · "Don't Break the Cache" arXiv 2601.06007 · Claude 4.5 compression RCT arXiv 2603.23525 · Chain-of-Draft arXiv 2502.18600 · TALE arXiv 2412.18547 (ACL'25 Findings) · Token Complexity arXiv 2503.01141 · brevity-hierarchy arXiv 2604.00025 · prompt-compression survey arXiv 2410.12388 · 500xCompressor arXiv 2408.03094 · xRAG arXiv 2405.13792 · PISCO arXiv 2501.16075 · Cartridges arXiv 2506.06266 · Chroma "Context Rot" trychroma.com/research/context-rot · ACE arXiv 2510.04618 · CacheBlend (EuroSys'25) arXiv 2405.16444 · KVzip arXiv 2505.23416 · Anthropic compaction platform.claude.com/docs/en/build-with-claude/compaction · Anthropic context engineering anthropic.com/engineering/effective-context-engineering-for-ai-agents

* Companion deep-dive: [`53-headroom-and-context-compression.md`](/research/token-optimization/53-headroom-and-context-compression/). Prior market/lit files: [`03`](/research/token-optimization/03-prior-art-and-market-scan/), [`46`](/research/token-optimization/46-fresh-literature-and-market-delta/), [`51`](/research/token-optimization/51-code-intelligence-tools/), [`52`](/research/token-optimization/52-qdrant-and-vector-databases/).

### Caveats on this file's own data [#caveats-on-this-files-own-data]

* **Star counts are point-in-time (2026-06-15) and unreliable as quality signals in this niche** (§A filter 1); reported only because adoption was asked about, with the watcher-ratio flag.
* **Two arXiv IDs could not be independently confirmed and may be mis-transcribed or pre-registration placeholders:** SWE-Pruner's (cited as 2601.16746) and a structural-graph paper (2603.27277, also referenced in file 51's ledger). The GitHub repos behind them are real and star-verified; treat those two specific identifiers as tentative pending a direct arXiv check.
* This file synthesizes two parallel research sweeps plus direct primary-source verification of every load-bearing claim; per the dossier's standing rule, no headline percentage here is a banked saving until reproduced on jackin' tasks via the file-51/53 validation harness at equal quality.
