51 - Code-intelligence tools: codedb, fff, CodeGraff, and alternatives
51 - Code-intelligence tools: codedb, fff, CodeGraff, and alternatives
alternatives re-sweep added the same day. This
is a targeted addendum requested after the main dossier: compare codedb, fff,
the CodeGraff codedb article, and the CodeGraff product site; then check whether
there are stronger alternatives for AI-agent code intelligence and token savings.
TL;DR
- Yes, this tool class can save tokens, but only when it replaces blind grep/read loops with precise path, symbol, caller, dependency, or function-scope retrieval. It does not magically shrink model reasoning, output, or cache rent.
- codedb is the strongest candidate for token savings because it returns
structured code context: symbols, outlines, callers, dependency graph, compact
reads, and
codedb_contextthat combines several lookup steps. Its published token numbers are vendor-side and response-size based, not yet independently reproduced here. - fff is primarily a resident file/content search accelerator. It likely saves tokens by reducing dead-end searches and wrong-file reads, but the upstream text publishes latency and qualitative token claims, not a numeric token percentage.
- CodeGraff is a larger agent/toolchain bet, not just a retrieval tool. The useful token-saving ideas are scope reads, symbol-safe patching, batching, and codedb-backed retrieval. The commercial/agent stack should be evaluated as a role-level opt-in, not silently installed on the host.
- The strongest alternatives are workload-dependent. For local open-source semantic navigation, Serena is the most serious codedb alternative. For commercial context quality/cost claims, Augment Context Engine is stronger. For enterprise multi-repo code intelligence, Sourcegraph MCP is stronger. For open-source semantic RAG with a published token-reduction claim, Code Context Engine and Claude Context are the clearest candidates.
- For jackin': pilot inside a role container, measure locally, and keep host
effects explicit. The existing the-architect roadmap already proposes
fff; codedb deserves an adjacent A/B pilot, and Serena/Claude Context deserve competitor arms if MCP schema overhead is deferred or bounded.
What is being compared
| URL | Entity | Category | Honest scope |
|---|---|---|---|
| https://github.com/justrach/codedb | codedb | Open source code-intelligence server and MCP toolset | Local structural index, MCP/HTTP/CLI, remote public-repo queries |
| https://github.com/dmtrKovalenko/fff | fff / fff-mcp | Open source resident file and content search toolkit | Fast path search and grep with frecency/git metadata |
| https://codegraff.com/blog/codedb-code-intelligence | CodeGraff article on codedb | Evidence and design explainer | Vendor write-up with latency, byte/token, and workflow examples |
| https://codegraff.com/ | CodeGraff / Graff / Pro tools | Terminal coding agent plus optional local file-tool suite and model gateway | Bigger workflow replacement; not a drop-in search primitive |
CodeGraff and codedb are related: CodeGraff says Graff is powered by codedb,
while codedb's repository links back to the CodeGraff article. Treat codedb as the
retrieval engine and CodeGraff as the broader agent/product surface around it.
Comparison matrix
| Axis | codedb | fff | CodeGraff |
|---|---|---|---|
| Primary job | Code intelligence for agents: tree, outline, symbols, callers, deps, search, compact reads, snapshots, remote repo queries | Fast resident file-name and content search | Full terminal coding agent plus optional Pro local file primitives (muonry, zigrep, zigread, zigpatch) |
| Agent interface | 21 MCP tools, HTTP server, CLI, npx launcher | MCP tools (ffgrep, fffind, fff-multi-grep), SDKs, Neovim plugin | graff CLI/TUI/SDK/gateway, MCP management, Pro MCP tools |
| Data model | In-memory structural indexes: outlines, word index, trigram search, dependency graph, content cache, change log | Warm file tree/content index, frecency DB, git status annotations, typo/fuzzy matching | Agent loop plus codedb retrieval and local daemon file tools |
| Best use | "Where is this symbol?", "what calls this?", "what depends on this?", "give context for this task", "read only this range/compact view" | "Find likely files/matches fast; avoid repeated rg process startup and bad rankings" | Replace whole-file reads and line-fragile edits with function/symbol-scope reads and patches |
| Token-savings evidence | Explicit vendor numbers: e.g. README claims structured search results around tens of tokens vs large raw grep dumps; benchmarks page claims 4x fewer bytes in a full edit workflow | Text says fewer grep roundtrips and token-efficient; no numeric token percentage in README text. Latency and memory claims are specific | Site claims "40x leaner", aggregate "tokens saved", structural read 47 vs 2,103 tokens, and 9x byte reduction in article workflow |
| Evidence tier for tokens | T3/T4: specific public numbers, vendor-interested, not locally replicated here | T4: plausible, but numeric token effect unpublished in text | T4: product counters and examples, vendor-interested, broad workflow confounders |
| Quality risk | Stale/wrong project root, unbounded tree/snapshot dumps, over-trusting fuzzy or structural approximation, telemetry/cache host writes | Fuzzy false positives, overusing it for semantic questions better answered by LSP/ast-grep/codedb | Larger adoption surface, paid/proprietary pieces, possible agent displacement, gateway/provider changes |
| jackin' fit | Good candidate for a role-scoped MCP/CLI pilot, with host-write guardrails | Already mapped in the-architect roadmap as resident file-search MCP | Evaluate only as an explicit alternative agent role/toolchain, not as default core behavior |
Alternatives found in the internet re-sweep
This section answers the follow-up question directly: are codedb and fff the best tools for this use case? Not universally.
| Alternative | Category | Better than codedb/fff when... | Worse or riskier when... | Token verdict |
|---|---|---|---|---|
Serena (oraios/serena) | Open-source semantic coding agent toolkit built around language servers and MCP | You need local, language-aware symbol navigation/editing through existing LSPs. It is more semantic than fff and overlaps codedb's strongest local use case. | You need codedb's remote public-repo queries, repo snapshots, or one bundled codebase context composer; quality depends on language-server coverage and project setup. | Likely token-positive for symbol tasks; no local measurement yet. Best local open-source competitor to codedb. |
Code Context Engine (elara-labs/code-context-engine) | Local open-source MCP with AST-aware chunks, hybrid vector/BM25 retrieval, graph expansion, compression, and session memory | You want a local tool whose primary pitch is measurable input-token reduction and one index shared across Claude Code, Codex, Cursor, Gemini CLI, and others. | The headline 94% benchmark is against full-file reads; the docs explicitly say real savings against normal Claude Code should be lower. Newer/smaller project than Serena or Sourcegraph. | Strongest open-source numeric token-saving claim found, but must be treated as T4/T3 until reproduced against the native harness baseline. |
| Augment Context Engine | Commercial/hosted codebase context engine and agent integration | You want the strongest published commercial claims for codebase-context quality and cost reduction, and cloud/vendor dependency is acceptable. | You need open, local-only, reproducible mechanics inside a jackin' role; model gateway/context engine effects are hard to separate. | Vendor claims include large quality gains and lower token bills. Treat as T4 until independently measured. |
| Sourcegraph MCP / Cody context | Enterprise code search, precise code navigation, SCIP-backed code intelligence, multi-repo context | You need cross-repo impact analysis, exact definitions/references across large organizations, or existing Sourcegraph infrastructure. | You only need a single local repo; setup and service dependency are heavier than codedb/Serena/fff. | Strong for reducing wrong-file exploration at enterprise scale; token % not publicly established for this exact use case. |
| Qodo Context Engine MCP | Commercial/enterprise deep-research codebase context engine exposed over MCP | You need multi-repo organization-aware answers, architectural impact analysis, and repository/docs knowledge through a managed context engine. | Heavier than local retrieval tools; agentic reasoning can add its own token cost, and token-saving claims are not the main public evidence. | More about correctness and broad impact analysis than raw token reduction; evaluate alongside Sourcegraph/Augment, not against fff. |
Claude Context (zilliztech/claude-context) | Open-source semantic code search MCP / RAG layer | You want vector+keyword semantic retrieval with a published benchmark claiming token reduction, and structural symbol graph is less important. | You need exact callers/dependencies or edits by symbol; embedding retrieval can return plausible but wrong context. | Public benchmark claims 39.4% token reduction on a code-search task set; needs local reproduction. |
| CodeGraphContext (CGC) | Open-source code knowledge graph / MCP | You want a graph-first repo memory layer and are willing to evaluate a newer system against codedb. | Maturity, setup cost, and token evidence are less clear; overlap with codedb is large. | Interesting codedb competitor, but not enough public evidence to call it better. |
| rust-analyzer / language-server MCPs | LSP-backed definitions, references, hover/type tools | You need exact language semantics. For Rust specifically, rust-analyzer is the quality floor, not optional. | You need broad repo retrieval, natural-language search, or cross-language summaries. | Negative-cost when a definition/reference call replaces grep + multiple reads; schema/tooling overhead must be bounded. |
| ast-grep / Semgrep MCP | Structural pattern search and rewrite | You need syntax-pattern matches or safe mechanical refactors (unwrap, derive blocks, call shapes). | You need semantic references/call graph or fuzzy natural-language retrieval. | Strong for targeted structural tasks; not a general replacement. Full verdict + skill/MCP/CLI form-factor analysis: ast-grep verdict below. |
| aider repo-map / repomix / RepoPrompt-style packers | Context packaging and repo maps | You need a compact up-front orientation artifact for a small/medium repo. | You are optimizing token spend aggressively; packers can front-load context instead of avoiding reads. | Useful baseline/control, not "significantly better" for token saving than bounded retrieval. |
Ranking by use case
| Use case | Best current candidate | Why |
|---|---|---|
| Local open-source semantic navigation for agents | Serena, then codedb | Serena leans on language-server semantics; codedb adds a broader indexed tool suite and remote-repo functions. Both beat fff for symbol/caller work. |
| Local file/path/content search speed | fff | Narrow but very good fit: warm resident search with frecency/git metadata. |
| Local task-shaped code context with many MCP primitives | codedb | Best bundled open-source package among the original set: outline/symbol/callers/deps/read/context. |
| Local open-source token-savings benchmark | Code Context Engine, then Claude Context | CCE publishes the largest number but against a generous full-file baseline; Claude Context publishes a smaller semantic-search reduction. Both need local reproduction. |
| Commercial best-effort "give the agent the right context" | Augment Context Engine | Strongest vendor-side context-quality and cost claims, but not local/open enough to treat as jackin' default. |
| Enterprise multi-repo impact analysis | Sourcegraph MCP or Qodo Context Engine MCP | Sourcegraph is the precise code-search/code-intelligence infrastructure answer; Qodo is the agentic deep-research/org-knowledge answer. |
| Semantic RAG with an explicit token-saving benchmark | Claude Context | Clearest open-source numeric token-reduction claim found in the sweep, though it is not structural code intelligence. |
| Rust-specific correctness | rust-analyzer + ast-grep, optionally with codedb/Serena | For Rust, LSP definitions/references and structural patterns are more trustworthy than fuzzy search. |
Practical answer: codedb and fff are not "the best two" as a pair. A stronger local jackin' stack would be:
rust-analyzer / ast-grep for exact semantic + structural facts
+ codedb or Serena for task-shaped code intelligence
+ fff for fast file/content search
+ native file-read/edit tools for final spans and patchesIf commercial/cloud tooling is acceptable, Augment, Sourcegraph, and Qodo are the serious challengers to test. If the requirement is open-source token reduction with an explicit benchmark, add Code Context Engine and Claude Context to the A/B.
Token economics
The relevant equation is:
net_saved =
avoided failed searches
+ avoided wrong-file reads
+ avoided whole-file reads
+ avoided follow-up calls
- MCP schema/tool-description rent
- oversized indexed-tool outputs
- index/setup prompts and stale-index recoveryThese tools are valuable when the first four terms dominate. They lose when the
agent simply adds their output on top of normal rg, cat, and file reads.
codedb verdict
Likely token-positive when used correctly; not yet proven locally.
codedb's upstream claims are stronger than fff's on token economics. The README
describes a context engine for agents and lists MCP tools for tree, outline,
symbol, search, word lookup, callers, dependency graph, compact reads, changes,
status, snapshots, local projects, remote public repos, and a codedb_context
composer. It also publishes token-efficiency examples where structured results are
orders of magnitude smaller than raw grep output, plus a benchmark page claiming a
full edit workflow drops from roughly 50 KB to roughly 12 KB.
The practical mechanism is sound:
codedb_wordorcodedb_symbolcan replace broad grep for exact identifiers.codedb_callerscan replace "grep symbol, read candidates, infer scope".codedb_depscan replace ad hoc import greps.codedb_outlinecan orient on a file before reading the whole file.codedb_readwith line ranges or compact mode can avoid full-file dumps.
Measured locally — the honest size of the lever. tools/count_tokens.py on three real repo
files (18,613 tokens if read whole):
| File | Lines | Read whole (tok) | Outline (tok) | One symbol-search (tok) | Outline cut | Search cut |
|---|---|---|---|---|---|---|
mount_info.rs | 289 | 4,404 | 323 | 82 | 93% | 98% |
dialog_widgets.rs | 415 | 5,598 | 189 | 89 | 97% | 98% |
update.rs | 647 | 8,611 | 1,141 | 209 | 87% | 98% |
| Total | 18,613 | 1,653 | 380 | 91% | 98% |
An outline (signatures + line numbers, what codedb_outline returns) costs 91% fewer tokens than
reading the file; a targeted symbol-search result (what codedb_search / ffgrep return) 98%
fewer. T1, locally reproduced — this is why the lever is real, and why the vendor multipliers
(1,628× / 40×) are per-query best-case against a whole-file-dump baseline a disciplined agent already
avoids.
codedb_contextcan collapse 3-5 serial location calls into one task-shaped response when the query is broad enough.
The caveat is output discipline. codedb_tree, codedb_snapshot, and remote tree
queries can be large. The CodeGraff hooks lab explicitly shows a guard for
unbounded codedb_remote action=tree calls. That is the right instinct: code
intelligence saves tokens only if calls return bounded, task-shaped context.
Local adoption verdict: Add codedb to the validation harness as an A/B arm
against native rg/Read and against fff. Do not count vendor token multipliers as
banked savings until measured on jackin' tasks.
fff verdict
High-confidence latency win; token savings are plausible but unquantified.
fff is narrower than codedb. It is a resident Rust search core exposed through MCP,
SDKs, and Neovim. It keeps a file/content index warm, adds git/frecency metadata,
supports typo/fuzzy fallback, and exposes agent-facing tools such as ffgrep and
fffind. The README's strongest concrete data is latency and memory: on very large
repos, repeated warm queries are positioned as sub-10 ms instead of repeated
multi-second ripgrep process spawns, with about 26 MB resident memory on a 14k-file
repo and roughly 360 bytes per indexed file for the content index.
Token savings can happen in three ways:
- fewer zero-result grep calls because fuzzy fallback finds likely variants;
- fewer wrong-file reads because frecency/git status rank active files higher;
- smaller result sets because weak-match detection prevents fuzzy noise from flooding the context.
But the upstream text does not provide a numeric token percentage in normal text. The prior dossier therefore correctly kept fff at T1 for latency and T4 for tokens. That verdict still stands after the re-check.
Local adoption verdict: Keep the existing the-architect fff pilot, but require equal target-file hit rate and measured tool-result-token reduction before treating fff as a token-saving lever.
CodeGraff verdict
Potentially token-positive as a workflow replacement; too broad to treat as a single retrieval optimization.
CodeGraff's product page says Graff is a terminal coding agent powered by codedb. Its Pro surface adds local file tools that return enclosing functions, structural outlines, symbol reads, symbol-safe patches, and batch operations through a persistent daemon. Those ideas attack a real waste pattern in coding agents: reading whole files, grepping without scope, editing by fragile line numbers, then re-reading entire files to verify.
The promising primitives are:
- "scope mode": return the enclosing function/block, not a naked match line;
- structural read: outline first, then symbol/function body instead of whole file;
- patch by symbol: reduce line drift and verification reads;
- batch operations: amortize per-call overhead and keep intermediate plumbing out of the chat transcript;
- codedb as a retrieval layer before action.
The claims are vendor-side and confounded by the full agent loop. The product site mentions aggregate tokens saved, per-op token counts, and last-30-day counters, but those are not independently auditable from the page. The codedb article's 9x byte workflow example is more concrete but still a selected vendor example.
Local adoption verdict: Do not fold CodeGraff into jackin' core. If useful, model it as an explicit role/toolchain experiment. Extract the technique pattern first: function-scope read, bounded search, diff-returning edits, and batch tools.
ast-grep verdict (structural search as an agent skill)
ast-grep appears in the recommended local stack above and in the row table, but it
was never given a full verdict. This section closes that gap (added 2026-06-18,
prompted by the operator's request to research the official
ast-grep/agent-skill package).
What it is. ast-grep (ast-grep/ast-grep, 14,617★, MIT, created 2022 — a mature,
monthly-released project; gh api 2026-06-18) is a Rust CLI for structural code
search, lint, and rewrite: it matches tree-sitter AST patterns ($X.unwrap(),
async fn without error handling, a specific call shape) and can rewrite them as
codemods across thousands of files, deterministically, never matching inside
comments or string literals. There are three ways an agent reaches it:
| Form | What it is | Standing token cost | Trade-off |
|---|---|---|---|
CLI (ast-grep on PATH) | The binary, called from a shell | Zero schema rent; cross-runtime | The agent must know the flags / be told to use it |
Agent skill (ast-grep/agent-skill, 750★, Claude-only, created 2025-11; last pushed 2026-01, ~5 months stale) | A Claude Code Skill teaching a verify-before-search loop (write rule → test on examples → search) | Near-zero — only the skill's frontmatter is advertised; full SKILL.md loads on trigger | Claude "cannot automatically detect when to use ast-grep" (the repo's own admission) — the user must name it; simple prompting "requires a model with up-to-date knowledge of ast-grep" |
MCP (ast-grep/ast-grep-mcp, 423★, explicitly experimental) | 4 tools: dump_syntax_tree, test_match_code_rule, find_code, find_code_by_rule | Schema rent every turn, used or not, plus uv/MCP setup | Deterministic structured rule-refinement loop; no reliance on the model's trained ast-grep knowledge |
Token economics — real mechanism, evidence by category not by ast-grep. The
saving is the same lever as codedb's symbol search: a structural match returns only
the nodes that match, with file:line, so the agent skips the grep → read-whole-file
→ confirm-context loop that burns input tokens. The mechanism is sound and
directionally well-supported, but there is no ast-grep-specific A/B token
benchmark anywhere. The headline numbers that circulate — ~32× (AddObservation
31,530 → 972 tokens), "3–50× fewer tokens per response", an "average 6×" output
compression — all come from independent analysts measuring graph/symbol tools
(Gortex, LSP-backed) or synthesizing the CoREB/SWEzze papers, not from ast-grep
itself. Tellingly, ast-grep's own materials (ast-grep.github.io, the AI-rules blog)
make zero token claims — they argue reliability of AI-generated rules via
test-before-search, not economics. Evidence tier: T4 for any ast-grep-specific
token percentage (none published); T2/T3 for the category mechanism (structural
retrieval beats grep+read — the same lever this file reproduces locally at −98% for
symbol search). The honest read: ast-grep saves tokens for the same structural reason
codedb does, and the magnitude is plausibly in the same band, but no one has measured
ast-grep specifically — treat the 32×/6× figures as cousins' numbers, not its own.
Niche — and the hard limit. ast-grep owns the structural modality in the
lexical → structural → graph escalation: it is the deterministic middle that needs no
language server, ideal for syntax-shape queries and especially codemods/rewrites.
It cannot resolve names across files — it cannot tell which parseConfig a call
refers to. Cross-file symbol resolution is LSP / rust-analyzer / Serena territory
(the graph modality); fuzzy/conceptual queries are embedding/semantic territory. So
ast-grep is a complement to, not a replacement for, codedb's caller/dependency graph
and rust-analyzer's compiler truth — exactly the routing the recommended stack and
the architect roadmap already encode.
jackin' verdict. For a multi-runtime role, the CLI form on PATH plus a shared
guidance file is the right default — zero schema rent, every runtime reaches it,
no Claude-only lock-in — which is precisely what the the-architect pilot chose (the
Claude-only skill is deliberately not in that pilot, and the experimental MCP's
per-turn rent is not justified when the CLI is free). The skill is worth knowing as
the lowest-effort Claude-only on-ramp, but its explicit-invocation gating and
five-month staleness make it a weak default. Decision record and routing:
docs/content/docs/reference/roadmap/architect-code-intelligence-tooling.mdx.
Recommended setup for AI agents
General rules for all three
- Teach the agent the retrieval contract. The instruction should be explicit: use indexed tools to locate paths/symbols first, then read the smallest exact span needed. Do not dump full trees, full snapshots, or whole files unless the task requires them.
- Bound every broad query. Cap result counts, prefer prefixes, and ask for path:line plus symbol names over raw line dumps.
- Use the right tool class. Plain text and literal strings can stay with
rg. File discovery can use fff. Symbol/caller/dependency questions should use codedb, rust-analyzer, or ast-grep where available. - Keep MCP schema overhead under control. If the client supports tool search or schema deferral, use it. If not, consider CLI/HTTP wrappers for rarely-used tools and expose only the hot retrieval calls over MCP.
- Smoke-test freshness at session start. Run a status/index-ready check before relying on results, especially after large code generation or checkout changes.
- Make host effects explicit. The upstream installers may write
~/.codedb,~/.claude.json,~/.codex/config.toml, or other client config. In jackin', install and register inside the role container unless the operator explicitly opts into host changes.
codedb setup
Recommended agent instruction:
Use codedb for code navigation before broad text search:
- Start unfamiliar tasks with `codedb_context` when the question spans multiple files.
- Use `codedb_symbol` or `codedb_word` for exact identifiers.
- Use `codedb_callers` before refactoring or changing public behavior.
- Use `codedb_deps` for import/dependency impact.
- Use `codedb_outline` before reading a large file.
- Prefer bounded `codedb_read` ranges or compact reads; avoid full snapshots and
unbounded trees unless explicitly needed.
- Prefer the client's native edit tool; `codedb_edit` is fallback only.Setup shape:
- Install inside the agent environment, not the host, for jackin' roles.
- Disable telemetry if the environment requires it:
CODEDB_NO_TELEMETRY=1. - Register MCP at user scope inside the container, for example:
claude mcp add codedb -s user -- /usr/local/bin/codedb mcporcodex mcp add codedb -- /usr/local/bin/codedb mcp. - Verify with
codedb --versionandcodedb statusorcodedb_status. - Ensure root resolution points at the mounted workspace, not
~or a system directory. Pass theprojectargument when a client does not supply MCP roots. - Add a guard hook for remote tree calls: require
expand=false, aprefix, or alimitbefore allowing largecodedb_remote action=treeresponses.
fff setup
Recommended agent instruction:
For file-name search and grep in the current git-indexed project, use fff before
falling back to repeated shell `rg` calls. Use `fffind` for paths and `ffgrep` for
content. Keep result sets small, refine weak/fuzzy matches, then read exact file
spans with the normal file-read tool.Setup shape:
- Install
fff-mcpinside the role image or setup hook. - Register at user scope in the container:
claude mcp add -s user fff -- fff-mcp. - Keep the existing the-architect pilot requirement: fff must measurably beat native ripgrep on this repo or be dropped.
- Do not use fff as a semantic engine. It finds likely files and lines; it does not replace rust-analyzer, ast-grep, or codedb's caller/dependency graph.
CodeGraff setup
Recommended agent instruction, if evaluating the full toolchain:
Use CodeGraff/Graff local file tools to avoid whole-file reads:
- Search in scope mode when changing a function or method.
- Outline before reading a large file.
- Read by symbol/function when possible.
- Patch by symbol when line drift is likely.
- Batch related reads/searches/diffs when the daemon supports it.Setup shape:
- Treat CodeGraff as an explicit agent/toolchain role, not as a transparent dependency of jackin' core.
- Install only in the container or on an operator-approved host path.
- Separate the free/open
graff/codedb path from paid Pro tooling and the CodeGraff model gateway. Measure each independently. - If using the gateway, keep model-routing/cache effects separate from local file-tool savings; otherwise the token analysis becomes impossible to attribute.
jackin' adoption recommendation
The existing roadmap page
docs/content/docs/reference/roadmap/architect-code-intelligence-tooling.mdx
already has the right shape for fff: role-scoped, opt-in, no jackin-core change,
registered in the container's user scope, and accepted only if it beats ripgrep
net of MCP overhead.
Extend that experiment rather than generalizing immediately:
- Keep the planned fff A/B.
- Add a codedb A/B arm if the role can carry another MCP server without always-on schema cost.
- Do not add CodeGraff Pro by default. If evaluated, make it a separate "agent stack replacement" arm.
- Use the same task suite and metrics for all arms.
- Promote only primitives that show equal-or-better target-file hit rate and lower total tokens per solved task.
Validation harness
Run 20-30 fixed repository-navigation tasks in at least four arms, expanding to the alternatives when they are installable in the test environment:
| Arm | Tools allowed |
|---|---|
| Native | Shell rg, find, normal file reads/edits |
| fff | fff MCP plus normal file reads/edits |
| codedb | codedb MCP plus normal file reads/edits |
| Serena | Serena MCP plus normal file reads/edits |
| Code Context Engine | CCE MCP plus normal file reads/edits |
| Claude Context | Claude Context MCP plus normal file reads/edits |
| Sourcegraph | Sourcegraph MCP or Sourcegraph local/enterprise context tools |
| Qodo | Qodo Context Engine MCP, if explicitly approved |
| Augment | Augment Context Engine / Augment agent context, if explicitly approved |
| CodeGraff | Graff/Pro local file tools, if explicitly installed |
Task categories:
- exact symbol definition lookup;
- fuzzy remembered phrase or path;
- caller/reference discovery;
- reverse dependency/impact analysis;
- large-file function edit;
- wrong-query/zero-result recovery;
- recently modified file discovery;
- public dependency or remote repo lookup, if testing codedb remote.
Metrics:
- target-file hit rate;
- top-1 target hit rate;
- turns to first correct file;
- tool calls;
- tool-result tokens;
- total input/output/cache tokens from the session ledger;
- wall-clock latency;
- edit correctness and test result;
- stale-index or wrong-root incidents;
- MCP schema tokens loaded at turn start, if the client exposes them.
Acceptance rule:
Accept a tool for token optimization only if:
target-file hit rate >= native
edit/test success >= native
total tokens per solved task <= native by at least 20-30%
no unbounded output path remainsLatency alone is not enough. A tool can be much faster and still neutral or negative on tokens if the agent reads the same files afterward.
Failure modes and guardrails
- Wrong project root: status looks ready, but it indexed
~or another repo. Guard: status check plus explicit project root. - Unbounded tree/snapshot: the index dumps more than native tools would.
Guard: hooks or instructions requiring
limit,prefix, compact mode, or line ranges. - Schema rent: 20+ MCP tools are loaded into every turn without deferral. Guard: tool-search/schema-deferral, CLI fallback, or narrower MCP exposure.
- Fuzzy confidence error: search returns plausible but wrong files. Guard: target-file benchmark, weak-match refinement, require exact verification before editing.
- Stale index after edits: agent trusts old symbol/caller data. Guard: status/changes checks and re-run query after large rewrites.
- Host mutation: installers auto-register client configs. Guard: container-only install or explicit operator opt-in surfaced in launch summary.
Bottom line
For AI-agent work, these tools are best understood as retrieval precision and observation-shaping tools, not compression tools. They save tokens when they help the agent look at fewer, better spans of code.
- codedb: strongest candidate from the original set; pilot it against native search with a strict bounded-output policy.
- fff: keep as the low-risk resident search pilot; expect latency wins first, token wins only if wrong-file/dead-end calls drop.
- CodeGraff: valuable ideas, larger adoption blast radius; evaluate as an explicit role or workflow replacement, not as a hidden dependency.
- Serena: best local open-source semantic-navigation alternative found in the re-sweep; add it to the A/B if language-server setup is acceptable.
- Code Context Engine: strongest local open-source token-savings claim found; benchmark against native Claude/Codex behavior before trusting the 94% headline.
- Augment / Sourcegraph / Qodo: stronger commercial or enterprise answers for broad codebase context, but too heavy/vendor-bound for a default jackin' role.
- Claude Context: best open-source alternative found with a public numeric token-reduction claim; evaluate for semantic-RAG tasks, not exact refactors.
Vector databases (Qdrant) — a separate question, answered in file 52
fff (lexical) + codedb (structural) cover the two retrieval paradigms that move code tokens. A vector database (semantic) is a third paradigm and a different job: it is not a token-saving layer for code navigation — neutral-to-loss once chunk over-fetch, index staleness, and the embedding-model cost are counted, and the flagship agents (Claude Code, Cline, Aider, Sourcegraph) dropped code embeddings for exactly those reasons. It earns its place only as a separate semantic layer for non-code knowledge (docs/tickets/decisions) and cross-session memory — and even there it is a wall-clock / tool-loop speedup more than a guaranteed token cut.
The full treatment — Qdrant vs alternatives (Milvus, LanceDB, pgvector, Turbopuffer, …); the benchmark
evidence (no engine decisively beats Qdrant, "best" is workload-dependent); the docs-corpus case; token
economics; a jackin' container setup; and a runnable validation protocol — is in
52-qdrant-and-vector-databases.md.
Source ledger
unless noted.
justrach/codedbREADME: https://github.com/justrach/codedb- codedb MCP setup: https://github.com/justrach/codedb/blob/main/docs/mcp.md
- codedb skill/context guidance: https://github.com/justrach/codedb/blob/main/docs/skills.md
- codedb benchmarks: https://github.com/justrach/codedb/blob/main/docs/benchmarks.md
- codedb hooks lab: https://github.com/justrach/codedb/blob/main/docs/hooks-labs.md
dmtrKovalenko/fffREADME: https://github.com/dmtrKovalenko/fff- CodeGraff codedb article: https://codegraff.com/blog/codedb-code-intelligence
- CodeGraff product site: https://codegraff.com/
- CodeGraff docs overview: https://codegraff.com/docs
justrach/codegraffREADME: https://github.com/justrach/codegraff- CodeGraff changelog: https://codegraff.com/changelog
- Serena: https://github.com/oraios/serena
- Code Context Engine: https://github.com/elara-labs/code-context-engine
- Code Context Engine docs: https://elara-labs.github.io/code-context-engine/
- Augment Code: https://www.augmentcode.com/
- Augment Context Engine / MCP: https://docs.augmentcode.com/
- Augment Context Engine MCP benchmark page: https://www.augmentcode.com/product/context-engine-mcp
- Sourcegraph MCP server: https://sourcegraph.com/docs/api/mcp
- Sourcegraph MCP product page: https://sourcegraph.com/mcp
- Sourcegraph code intelligence: https://sourcegraph.com/docs/code-navigation
- Qodo Context Engine MCP: https://docs.qodo.ai/developer-tools/context-engine-mcp
- Claude Context: https://github.com/zilliztech/claude-context
- Claude Context benchmark article: https://milvus.io/blog/claude-context-vs-claude-code-complete-code-context-solution-for-ai-coding-assistants.md
- CodeGraphContext: https://github.com/CodeGraphContext/CodeGraphContext
- aider repo map: https://aider.chat/docs/repomap.html
- ast-grep core: https://ast-grep.github.io/ · AI-prompting docs (four integration modes, no token claims) https://ast-grep.github.io/advanced/prompting.html
- ast-grep agent skill (Claude-only, reliability-focused
SKILL.md, 750★, stale since 2026-01): https://github.com/ast-grep/agent-skill - ast-grep MCP (experimental, 4 tools, 423★): https://github.com/ast-grep/ast-grep-mcp
- Structural-vs-graph token analysis (independent; the 32× / 3–50× figures are graph-tool numbers, not ast-grep's): https://zzet.org/gortex/grep-replacement-for-ai-agents/ · "which tool when" (lexical/structural/graph/semantic modalities) https://ceaksan.com/en/code-search-for-ai-agents-which-tool-when
- ast-grep CLI/skill decision record for jackin' roles:
docs/content/docs/reference/roadmap/architect-code-intelligence-tooling.mdx - Existing jackin' fff pilot roadmap:
docs/content/docs/reference/roadmap/architect-code-intelligence-tooling.mdx
Vector/semantic layer:
- Qdrant: https://github.com/qdrant/qdrant · https://qdrant.tech/documentation/ · MCP server https://github.com/qdrant/mcp-server-qdrant
- Claude Code chose agentic search over a vector DB (Boris Cherny): https://x.com/bcherny/status/2017824286489383315
- Cline, why it does not index: https://cline.bot/blog/why-cline-doesnt-index-your-codebase-and-why-thats-a-good-thing
- Sourcegraph Cody dropped embeddings: https://sourcegraph.com/blog/how-cody-understands-your-codebase
- Cursor keeps semantic search (accuracy, complement to grep): https://cursor.com/blog/semsearch
- Structural graph ~10× fewer tokens vs file exploration: https://arxiv.org/abs/2603.27277
- Keyword search reaches ~90% of RAG without a vector DB (Amazon/AAAI): https://www.amazon.science/publications/keyword-search-is-all-you-need-achieving-rag-level-performance-without-vector-databases-using-agentic-tool-use
- Semble code-search token benchmark (snippet vs grep+read): https://blakecrosley.com/blog/agent-code-search-token-budget
- Vendor token claim (treat as marketing): https://milvus.io/blog/claude-context-reduce-claude-code-token-usage.md
- RAG over-fetch (3–5×): https://thetokencompany.com/blog/why-rag-token-costs-are-high
- Letta memory benchmark (files-only beat vector memory): https://www.letta.com/blog/benchmarking-ai-agent-memory/
- LanceDB (embedded alternative for local agents): https://github.com/lancedb/lancedb