# 54 — Context-compression literature and market delta (2024–2026) (https://jackin.tailrocks.com/research/token-optimization/54-context-compression-literature-and-market/) # 54 — Context-compression literature and market delta (2024–2026) [#54--context-compression-literature-and-market-delta-20242026] Companion to the headroom deep-dive (53): the internet re-sweep for context-compression projects and academic work the dossier's 03/46/51/52 do not already cover, run to answer "what else is out there, and did we miss anything?" Two parallel research efforts fed this file — a market sweep of shipping tools and a clean-room academic literature pass across eight threads — each independently cross-checked against primary sources. Research conducted 2026-06-15; every external claim carries a source in the ledger; tiers per the dossier scheme (T1 shipped+reproduced / T2 peer-reviewed / T3 community-replicated / T4 speculative). This is the compression-layer analogue of file 46's fresh-literature role. ## TL;DR [#tldr] * **The single most useful synthesis is the cache-safety axis (§C): output brevity is cache-neutral; input compression is cache-breaking unless done at write-time on new content.** Everything else sorts under this. On a hosted, already-caching Claude, a compressor that recompresses the prompt prefix must beat **\~10× (pure prefix) / \~5.5× (mixed)** just to break even against the 0.1× cache-read price — confirmed by a 358-run Claude Sonnet 4.5 RCT where aggressive compression *raised* cost 1.8% (arXiv 2603.23525) and by "Don't Break the Cache" (arXiv 2601.06007). * **The frontier moved to code-domain, hosted-viable, write-time compression that *improves* SWE-bench — refuting file 46 D's "no new compressor safe for code."** Squeez (arXiv 2604.04979: 92% tool-output removal at 0.86 recall, write-time Unix pipe, cache-safe), AgentDiet (arXiv 2509.23586: Claude 4 Sonnet **64.5%→66.5%**, input −40–60%, and the only paper that nets out its own +5–15% compressor cost), SWEzze/OCD (arXiv 2603.28119: +5–9.2% resolution at \~6×), SWE-Pruner (Claude Sonnet 4.5 **70.6%→72.0%**), LongCodeZip (ASE 2025: training-free, 5.6×, no loss). * **Stars are noise in this niche.** A GitHub search found 63 repos created since Dec 2025 with >25k stars; the flagship "compression" repos show abnormal star:watcher ratios (RTK 62,461★/146 watchers = 428:1; CodeGraph 49,439/115; headroom 28k/95) on 3–5-month-old repos. Rank by **evidence, forks, and open-issue activity**, never stars. * **Most "compression" is selective inclusion, not payload compression.** The flashiest multipliers (82×–528×, 120×, 99.1%) come from querying an index instead of reading whole files against a worst-case grep baseline — real, but the file-51/52 lever, not LLMLingua-style ratios; and several headline "98%" figures are **bytes, not tokens**. * **The peer-reviewed standouts are contrarian and code-relevant.** the-complexity-trap (NeurIPS'25, JetBrains): cheap **observation masking matches LLM summarization at half the cost** on SWE-bench — and implicitly debunks the summarize-history compaction most agents ship. ACON (arXiv 2510.00615): 26–54% peak-token at >95% accuracy, gradient-free for hosted APIs. OpenHands batched condensation: the only widely-shipped **cache-aware** compaction. * **Independent headroom measurements land at \~47–50%** (Miya-Gadget 47.5%, HN \~50%), below the 60–95% headline (full treatment in 53). * **Soft-prompt / embedding compression (gist/ICAE/500x/xRAG/PISCO/Cartridges) is confirmed a dead end on hosted Claude** — every method needs embedding-input or KV-write access no hosted API exposes. Only Selective Context (a hard-prompt method) reaches a hosted API, at \~2×. ## A. Market re-sweep — context-compression tools the dossier had not cataloged [#a-market-re-sweep--context-compression-tools-the-dossier-had-not-cataloged] Excludes everything already in 03/46/51/52 (caveman family, fff, claude-mem, aider map, LiteLLM, ccusage, TOON, LLMLingua, codedb, CodeGraff, Serena, the named context engines, vector DBs, KV-eviction families, CAG, Mem0). All numbers vendor self-reported unless tagged otherwise; none reproduced locally. ### Five skeptical filters (apply to every row) [#five-skeptical-filters-apply-to-every-row] 1. **Stars are a PR artifact** (see TL;DR) — use forks/issues/watchers and evidence instead. 2. **Selective inclusion ≠ payload compression.** "Query an index" multipliers are the file-51 lever against a worst-case baseline; do not compare them to fixed-prompt compression ratios. 3. **Evidence is almost all vendor self-report.** Peer-reviewed exceptions are rare and mostly NL-domain. 4. **NL→code transfer is unproven** for most compressors (benchmarked on NQ/TriviaQA/HotpotQA, not SWE-bench). 5. **Cache breakage is the silent cost-killer.** History-rewriting compaction/proxies can destroy the 10× cache-read discount they nominally beat (Anthropic docs: "Tools that drop or summarize older messages… invalidate cache hits"). Only a handful engineer around it. ### Condensed market table (load-bearing rows) [#condensed-market-table-load-bearing-rows] | Tool | Category | Mechanism | Claimed saving (verbatim) | Tier | Cache-safe? | License | | -------------------------------------------- | ---------------------- | ----------------------------------------------------------------------------- | ------------------------------------------------------------------ | ----------------------------------------------------------------------------------------- | --------------------------------------------------- | ------------------------- | | **llmtrim** (fkiene/llmtrim) | Proxy | Rust proxy; signature-skeletonize, dedupe, JSON re-encode | "−31% input / −74% output"; "round-trip −66%"; quality 78.9%→82.2% | T3 (live A/B, input only) | **Yes** ("without ever touching the cached prefix") | AGPL-3.0 | | **entroly** (juyterman1000/entroly) | Repo-context selection | BM25 + dep-graph + knapsack under budget; "Cache Aligner" byte-stable prefix | "70–95%"; "8K budget: 99.1%" | T3 | Claimed prefix-stable | Apache-2.0 | | **TokenTamer** (borhen68/TokenTamer) | Proxy | Drop-in; injects `cache_control` *before* compressing stale tool reads | "50–80%"; "\~73% off input" | T4 (self-admits "synthetic payloads") | Cache-first design | MIT | | **claw-compactor** (open-compress) | Library | 14-stage, **zero-LLM**, reversible RewindStore, AST-aware, KV-cache alignment | weighted avg **36.3%**; JSON 81.9% | T3 (ROUGE-L vs LLMLingua-2: 0.653 vs 0.346) | Claimed aligned | MIT | | **leanctx** (jia-gao/leanctx) | Library | Wraps LLMLingua-2 locally + summarizer; routes code/errors verbatim | "removes 57%… doubles baseline accuracy" (15-item NL subset) | T3 | Composes with `cache_control` | MIT | | **RTK** (rtk-ai/rtk) | Tool-output CLI/hook | Deterministic Rust proxy + PreToolUse hook, no LLM | "60–90%"; session −80% (labeled "Estimates") | T4 — **own Issue #582: hook *raises* CC cost 18%**; #886 bypasses permission prompts | Hook rewrites output → **risk** | Apache-2.0 | | **context-mode** (mksglu/context-mode) | Tool-output sandbox | `ctx_execute` → raw output to SQLite FTS5, only stdout enters context | "98%" (315 KB→5.4 KB — **bytes**) | T3 (21-scenario BENCHMARK.md) | Sandbox sidesteps cache | **Elastic 2.0 (not OSI)** | | **chop** (AgusRdz/chop) | Tool-output hook | PreToolUse deterministic pattern-match, 60+ cmds | "git status 95%; npm test 99%" | T4 (per-cmd) | Hook on new output | MIT | | **mcp-compressor** (atlassian-labs) | MCP schema | Proxy compresses tool **definitions** (not results) | (no headline) | T3 (Atlassian Labs) | Schema-side | Apache-2.0 | | **flightlesstux/prompt-caching** | Cache-maximizer | Auto-inserts Anthropic `cache_control` on stable prefixes | "up to 90% on repeated reads" | T3 (matches 0.1× cache price) | **Helps caching** | MIT | | **OpenHands Condenser** | Compaction harness | LLM summary of old events, **batched** to preserve cache | "up to 2× per-turn cost"; SWE-bench 54% vs 53% | **T1 (SWE-bench)** | **Yes — batched** | MIT | | **the-complexity-trap** (JetBrains-Research) | Compaction study | **Observation masking** vs LLM summarization, SWE agents | masking "halves cost… matching solve rate"; hybrid −7–11% | **T2 (NeurIPS'25, arXiv 2508.21433)** | Model-agnostic | MIT | | **ACON** (microsoft/acon) | Observation+history | Gradient-free; optimizes the compression prompt in NL space | "26–54% peak token"; ">95% accuracy when distilled" | **T2 (arXiv 2510.00615)** | Side-call (preservable) | MIT | | **Zep / Graphiti** (getzep) | Memory graph | Temporal knowledge graph | **"17% smaller" (5,760 vs 6,956); "35% smaller"** | T3 + arXiv 2501.13956 | Append-style | Apache-2.0 | | **GPTCache** (zilliztech) | Semantic cache | Embedding-similarity response cache | (hit-ratio, no headline) | **T2 (ACL'23)** but **stale (2025-07)** | Provider-agnostic | MIT | (The fuller per-category catalog — proxies, libraries, tool-output hooks, code-index "selective inclusion" tools, semantic caches, compaction harnesses, memory systems, and RAG-chunk compressors — and every star/fork figure with its watcher-ratio flag is preserved in the source ledger below; the rows above are the ones that move a decision.) ### Credible challengers ranked by evidence (not stars) [#credible-challengers-ranked-by-evidence-not-stars] 1. **the-complexity-trap** (JetBrains-Research, NeurIPS'25, MIT) — the most credible artifact in the sweep: peer-reviewed, *code-specific* (SWE-bench Verified), and contrarian — cheap **observation masking matches expensive LLM summarization at \~half the cost**. A method, not a drop-in, but it is the only rigorous code evidence here and it debunks the summarize-history compaction that Cline/Goose/Hermes ship. 2. **OpenHands Condenser** (MIT) — best-evidenced *shipping* compaction with a real SWE-bench number and the only harness **cache-aware by design** (batched condensation). 3. **ACON** (Microsoft, peer-reviewed) — the most conservative, therefore most believable, agent-context number: 26–54% peak-token at >95% accuracy; gradient-free so it runs against hosted APIs. 4. **llmtrim** (AGPL) — the most caching-honest proxy: explicitly compresses "without touching the cached prefix" and discloses its measurement limits. A more transparent, narrower headroom competitor. 5. **claw-compactor** (MIT) — strongest library: zero-LLM, deterministic, **reversible (RewindStore = an open, inspectable CCR)**, AST-aware that never renames identifiers, honest 36.3% with an apples-to-apples ROUGE-L baseline vs LLMLingua-2. 6. **flightlesstux/prompt-caching** (MIT) — orthogonal and the one tool whose savings are *provably real* because it only uses Anthropic's documented `cache_control`; maximizes caching instead of risking it. **Cautionary tales (high hype, weak substance):** RTK (62k stars, but its own issue tracker says the hook raises cost 18% and bypasses Claude Code's permission prompts — output-rewriting hooks are cache-unsafe); the "82×–528× / 120× / 99.1%" code-index tools (selective inclusion vs worst-case grep, not compression — the file-51 lever); and any soft-token RAG method (xRAG/REFRAG/PISCO/OSCAR), which cannot run against the Anthropic API at all (§B thread 1). ## B. Fresh literature (2024–2026), by thread [#b-fresh-literature-20242026-by-thread] The meta-finding: the field moved from NL-QA soft-prompt compression (a dead end for hosted APIs) to code-domain, agent-trajectory, hosted-compatible compression evaluated on SWE-bench — and several of these *improve* accuracy while cutting tokens. * **Thread 1 — Soft-prompt / embedding compression: dead for hosted Claude.** Gist (26×), ICAE (4×), 500xCompressor (6–480×, 62–73% retention), xRAG (1 token/doc), AutoCompressors, COCOM, PISCO (16× @ 0–3% loss), Cartridges (38.6×), LLoCO (30×) — all compress into embeddings/KV the model must be trained to read; surveys say plainly they are "impractical for API-based proprietary LM services" (arXiv 2410.12388). **Correction to the dossier's K1:** 500xCompressor writes per-layer KV — the "works without fine-tuning" framing is misleading for hosted use. Only **Selective Context** (a hard-prompt method, \~2×) reaches a hosted API. * **Thread 2 — AST/code-aware compression: the new, hosted-viable, accuracy-positive frontier.** **The Perplexity Paradox** (arXiv 2602.15843): LLMLingua on code causes "Function Identity Collapse" — 86.1% of failures are NameError — fixed by deterministic **signature injection** (+34pp; NameError 86.1%→6.1%), which is exactly headroom's CodeAwareCompressor design. **SWEzze/OCD** (arXiv 2603.28119): AST-aware, \~6×, **SWE-bench resolution +5.0–9.2%**. **LongCodeZip** (ASE 2025): training-free, model-agnostic, **5.6× without loss** on code. **CodeMEM** (arXiv 2601.02868): AST memory, no fine-tuning, +12.2% instruction accuracy. **REPOFUSE**: signature-only "rationale context." All run as preprocessing on hosted Claude but are query-conditional → cache-breaking per instance (so: compress at ingestion, not in the prefix). * **Thread 3 — Tool-output / observation compression: the richest vein, several measured on Claude.** **AgentDiet** (arXiv 2509.23586): the canonical "separate cheap summarizer" pattern and the **only** one that accounts for its own cost (+5.2–14.8%) — net Claude 4 Sonnet **64.5%→66.5%**, input −39.9–59.7%. **ACON** (arXiv 2510.00615): gradient-free, hosted-API-compatible, peak-token −26–54.5%. **M2** (arXiv 2603.00503): training-free, **Claude-3.7 +12.5%/−30.3% tok, Claude-4 +5.5%/−39.1%**. **CaT/SWE-Compressor** (arXiv 2512.22087): SWE-bench 49.8%→57.6%. First-party: Claude API automatic compaction (beta, default 150k-token trigger; **invalidates the cached prefix** unless you breakpoint the system-prompt end + the compaction block; iteration tokens billed but excluded from top-level usage) and Claude Code's **microcompact** (no-LLM redundant-tool-output trimming every turn — the cheap, always-cache-safe layer). * **Thread 4 — Reversible / retrievable compression: paging composes with caching; context editing breaks it.** MemGPT/Letta (DMR 93.4% vs 35.3% summary baseline — but a full-context baseline hit 94.4%, so paging buys little when everything fits), Anthropic memory tool + just-in-time context (+39%/84% on a 100-turn web eval, append-style, cache-friendly). The exception: **Anthropic context editing explicitly invalidates the cached prefix on each clear** (`clear_at_least_input_tokens` exists precisely to make the re-cache worth it). Headroom CCR and claw-compactor's RewindStore are the buildable reversible+cache-aware tools. * **Thread 5 — Learned-importance selection.** **Squeez** (arXiv 2604.04979): the standout — task-conditioned tool-output pruning, **92% removal at 0.86 recall**, runs as a write-time Unix pipe (`pytest | squeez 'find the failure'`), **cache-safe**, code-domain (BM25 only 0.22 recall on tool output — retrieval fails on logs). **SWE-Pruner** (GitHub Ayanami1314/swe-pruner; arXiv ID tentative): 0.6B skimmer, **Claude Sonnet 4.5 70.6%→72.0%**, tokens −23–38%, but mid-stream → cache-breaking. **Provence** (ICLR'25): strong but NL-QA only, never code-evaluated. * **Thread 6 — Context rot: the accuracy argument for compression.** Chroma's "Context Rot" (2025-07, **18 models incl. Claude Opus 4 / Sonnet 4 / 3.7 / 3.5 / Haiku 3.5**): reliability falls as input grows *even on trivial retrieval*, with Claude Opus 4 / Sonnet 4 showing the most pronounced gap (they abstain under ambiguity); all 18 did better on a shuffled than a logically-ordered haystack. Plus "Lost in the Middle" (U-shaped, >30% drop) and RULER (17/17 long-context models degrade). **ACE** (ICLR'26): evolving playbooks with incremental updates avoid "context collapse," +10.6% on agents, token −83.6% — but base model is DeepSeek, not Claude. Cognition's "Don't Build Multi-Agents" is the counterpoint for coding: prefer a single linear agent + a dedicated compression LLM. * **Thread 7 — THE CRUX (input compression vs caching).** No hosted-API scheme makes per-turn input compression cache-compatible. Break-even ≈ **10× pure-prefix / 5.5× mixed**. Hard evidence: "Don't Break the Cache" (arXiv 2601.06007: caching saves 41–80%, put dynamic content last) and the 358-run Claude 4.5 RCT (arXiv 2603.23525: aggressive compression *raised* cost 1.8%). Every cache-*aware* compressor (CacheBlend EuroSys'25 Best Paper; KVzip NeurIPS'25) is **self-host-only**. The only hosted-Claude-safe input moves are (a) offline compress-once → cache (the CAG pattern, file 46 FL1) and (b) write-time observation compression before content enters the cached prefix. * **Thread 8 — Output brevity vs input compression.** Output is billed \~5× input and is **cache-neutral**, so brevity is strictly the safer lever. **Chain of Draft** (arXiv 2502.18600, tested on Claude 3.5 Sonnet): sports 92.4% output cut **+4.1pp**, GSM8K \~79% cut −4.4pp, latency −48%. **TALE** (ACL'25): 68.9% at \<5% loss. **Token Complexity** (arXiv 2503.01141): the guardrail — every task has a minimal-token floor below which accuracy collapses. **Brevity-hierarchy** (arXiv 2604.00025): verified to exist (file 03 flagged it unverified) but a trap — solo/unreviewed, +26.3pp is a cherry-picked 7.7% subset, no Claude, no code; do not propagate "+26 points." ## C. The unifying synthesis — a cache-safety classification of every compression move [#c-the-unifying-synthesis--a-cache-safety-classification-of-every-compression-move] This is the table the dossier was missing: sort compression by *where* it acts, and cache-safety falls out deterministically. It is the practical decision rule for a hosted-Claude operator. | Class | Examples | Acts on | Cache interaction | Hosted-Claude verdict | | ----------------------------------------------- | ---------------------------------------------------------------------------------------------- | ------------------------------------- | ----------------------------------------------------- | -------------------------------------------------------------------------------------------------------------- | | **Output brevity** | caveman, Chain-of-Draft, TALE, effort tiering | Generated tokens | **Neutral** (never touches the prefix) | **Adopt first** — unconditionally cache-safe, hits the 5×-priced class. Bounded by the token-complexity floor. | | **Write-time observation compression** | Squeez, AgentDiet, microcompact, headroom MCP mode, chop, the-complexity-trap masking | New tool outputs *before* first cache | **Safe** (shrinks the write that was going to happen) | **Adopt second** — the only cache-safe input lever; evidence now shows it can *raise* SWE-bench accuracy. | | **Offline compress-once → cache** | CAG pattern (FL1), a curated repo/spec prefix | A stable corpus, once | **Composes with caching** | **Adopt where the corpus fits context** and is queried many times. | | **Cache-maximization** | flightlesstux/prompt-caching, CacheAligner (prefix-hygiene part), `/cd` | The prefix itself | **Helps caching** | **Adopt** — provably real (documented API feature). | | **Reversible paging / memory** | MemGPT/Letta, Anthropic memory tool, CCR, RewindStore | Out-of-context store + retrieve | **Append-style: safe** | **Adopt** for cross-session state; net-accounting still unpublished. | | **Whole-prompt / mid-stream input compression** | LLMLingua proxies, headroom proxy mode, SWE-Pruner, history-rewriting compaction (Cline/Goose) | The cached prefix, per request | **Breaks the cache** (must beat \~10×) | **Avoid on hosted Claude** unless it provably beats break-even and preserves `cache_read`. | | **Soft-prompt / embedding / KV compression** | gist, ICAE, 500x, xRAG, PISCO, Cartridges, CacheBlend, KVzip, SnapKV family | Model internals | n/a (no API channel) | **$0 — self-host only** (confirms file 46's wall). | ## D. Corrections and refinements to the dossier [#d-corrections-and-refinements-to-the-dossier] * **New caveat — stars are not adoption in the compression niche.** The dossier's market files (03/51/52) cite star counts as an adoption signal; in this sub-field that is now invalid (PR-spike inflation, abnormal watcher ratios). Rank by evidence/forks/issues. Applied retroactively to file 53's headroom/caveman star figures. * **Refute file 46 D ("no new lossy compressor both user-reachable on hosted Claude and safe for code").** Superseded by Squeez, AgentDiet, SWEzze, SWE-Pruner, LongCodeZip — code-domain, hosted-viable, accuracy-neutral-or-positive — provided they act at write-time/ingestion, not on the cached prefix. The Perplexity Paradox supplies the mechanism (signature injection beats perplexity pruning on code). * **Sharpen the break-even (record 19 / file 46 FL3).** State it as **\~10× on a fully-cacheable prefix, \~5.5× on a mixed prompt**, with the cacheable-fraction assumption explicit — the single most attackable number in the dossier's chain, now backed by a Claude-specific RCT. * **Correct the caveman brevity citation (file 03 record 01 / K1).** arXiv 2604.00025 is real but NL-only, unreviewed, cherry-picked; the defensible "brevity helps accuracy" evidence is Chain-of-Draft on Claude 3.5 Sonnet (+4.1pp at −92% output), a modest effect consistent with the dossier's existing read. * **Correct the soft-prompt framing (K1).** 500xCompressor/ICAE/xRAG are not hosted-usable "without fine-tuning" — they need embedding-input or per-layer-KV channels; the family is self-host-only. * **Promote output brevity above input compression** in the recommendation hierarchy on cache-neutrality grounds (§C). ## E. jackin' implications [#e-jackin-implications] The compression layer sorts cleanly onto the cache-safety classification (§C), and the jackin' recommendation follows from it: 1. **Bake the cache-safe set, in order:** output brevity (already shipped via the caveman family) → write-time observation compression (pilot headroom MCP mode or a hook equivalent like the-complexity-trap masking / chop / Squeez-style filtering) → native cache-maximization (`/cd`, stable prefixes, `cache_control` discipline) → offline compress-once for stable corpora (the CAG pattern). 2. **Do not put a whole-prompt compressor in a jackin' container's hot path** — it is a cache-bust risk, a CompressionAttack surface (file 46 FL3), and on already-caching Claude Code it usually loses (the RTK issue tracker is the worked cautionary tale: a hook that *raised* cost 18%). 3. **Prefer peer-reviewed, code-domain, cache-aware tools** when piloting: the-complexity-trap (observation masking), OpenHands-style batched condensation, ACON. Rank candidates by evidence, never by stars. 4. **Reuse the file-51 validation harness** with cache-read continuity as a first-class metric (already added in 53's harness): accept a compressor only if it beats the existing hooks+code-intel arm on tokens-per-solved-task at equal quality *and* preserves `cache_read`. 5. **Memory layer: pick one** reversible, cross-agent store if needed (headroom memory, Letta, or Zep/Graphiti — the only one with paired before/after token counts), and demand net-accounting before trusting it. ## Verification ledger [#verification-ledger] All accessed 2026-06-15. Independent measurements and peer-reviewed work flagged inline; everything else is vendor self-report. **Independent headroom measurements:** Miya-Gadget benchmark (47.5%, RAG 0%) — miyagadget.page/en/blog/2026/06/03/headroom-ai-context-compression-benchmark-en (via reader proxy; site 403s direct) · HN "\~50%" — news.ycombinator.com/item?id=46663757 · negative evidence (HN story index has zero hits for the tool) — hn.algolia.com/api/v1/search?query=headroom\&tags=story · press churn origin — theregister.com/ai-ml/2026/05/31/netflix-wiz-creates-app-to-slash-ai-bills-then-open-sources-it **Market tools:** github.com/fkiene/llmtrim · github.com/juyterman1000/entroly · github.com/borhen68/TokenTamer · github.com/open-compress/claw-compactor · github.com/jia-gao/leanctx · github.com/rtk-ai/rtk (Issue #582, #886) · github.com/mksglu/context-mode · github.com/AgusRdz/chop · github.com/atlassian-labs/mcp-compressor · github.com/flightlesstux/prompt-caching · github.com/OpenHands/OpenHands · github.com/microsoft/acon · github.com/JetBrains-Research/the-complexity-trap · github.com/getzep/graphiti · github.com/zilliztech/GPTCache · github.com/Compresr-ai/Context-Gateway · github.com/colbymchenry/codegraph · github.com/tirth8205/code-review-graph · github.com/DeusData/codebase-memory-mcp · github.com/letta-ai/letta · github.com/firecrawl/firecrawl-mcp-server **Literature:** Squeez arXiv 2604.04979 · AgentDiet arXiv 2509.23586 · ACON arXiv 2510.00615 · the-complexity-trap arXiv 2508.21433 (NeurIPS'25 DL4Code) · Perplexity Paradox arXiv 2602.15843 · SWEzze/OCD arXiv 2603.28119 · LongCodeZip ASE 2025 (arXiv 2510.00446) · M2 arXiv 2603.00503 · CaT arXiv 2512.22087 · CodeMEM arXiv 2601.02868 · "Don't Break the Cache" arXiv 2601.06007 · Claude 4.5 compression RCT arXiv 2603.23525 · Chain-of-Draft arXiv 2502.18600 · TALE arXiv 2412.18547 (ACL'25 Findings) · Token Complexity arXiv 2503.01141 · brevity-hierarchy arXiv 2604.00025 · prompt-compression survey arXiv 2410.12388 · 500xCompressor arXiv 2408.03094 · xRAG arXiv 2405.13792 · PISCO arXiv 2501.16075 · Cartridges arXiv 2506.06266 · Chroma "Context Rot" trychroma.com/research/context-rot · ACE arXiv 2510.04618 · CacheBlend (EuroSys'25) arXiv 2405.16444 · KVzip arXiv 2505.23416 · Anthropic compaction platform.claude.com/docs/en/build-with-claude/compaction · Anthropic context engineering anthropic.com/engineering/effective-context-engineering-for-ai-agents * Companion deep-dive: [`53-headroom-and-context-compression.md`](/research/token-optimization/53-headroom-and-context-compression/). Prior market/lit files: [`03`](/research/token-optimization/03-prior-art-and-market-scan/), [`46`](/research/token-optimization/46-fresh-literature-and-market-delta/), [`51`](/research/token-optimization/51-code-intelligence-tools/), [`52`](/research/token-optimization/52-qdrant-and-vector-databases/). ### Caveats on this file's own data [#caveats-on-this-files-own-data] * **Star counts are point-in-time (2026-06-15) and unreliable as quality signals in this niche** (§A filter 1); reported only because adoption was asked about, with the watcher-ratio flag. * **Two arXiv IDs could not be independently confirmed and may be mis-transcribed or pre-registration placeholders:** SWE-Pruner's (cited as 2601.16746) and a structural-graph paper (2603.27277, also referenced in file 51's ledger). The GitHub repos behind them are real and star-verified; treat those two specific identifiers as tentative pending a direct arXiv check. * This file synthesizes two parallel research sweeps plus direct primary-source verification of every load-bearing claim; per the dossier's standing rule, no headline percentage here is a banked saving until reproduced on jackin' tasks via the file-51/53 validation harness at equal quality.