# Code-intelligence tooling for the-architect role (https://jackin.tailrocks.com/reference/roadmap/architect-code-intelligence-tooling/)


**Status**: Open — pilot ready to implement (the-architect role only, opt-in, no jackin-core code change). Pilot stack decided: `ast-grep` + `rust-analyzer` + cargo verifiers + `fff` + `codedb`. `fff` and `codedb` ship together as complementary, first-class tools — lexical search vs structural intelligence, settled on architectural grounds, not an A/B (see [Design](#design)); `ast-index` was evaluated and deferred. See [Tool evaluation and comparison](#tool-evaluation-and-comparison) for the reasoning and the verified findings behind these choices.

## Objective [#objective]

Equip the the-architect role with a Rust code-intelligence stack and make the agent use it: structural search and refactor (`ast-grep`), semantic navigation (`rust-analyzer`), the cargo verifiers (`nextest`, `clippy`, `rustfmt`), and a resident file-search server (`fff`). A code-intelligence server, `codedb`, is carried as an A/B evaluation arm against native search. The guidance that tells the agent which tool to reach for — and, critically, the routing rule that stops two tools from both claiming "all search" — is authored once and applied to every runtime the role supports (`claude`, `codex`, `amp`, `kimi`, `opencode`) with no duplicated content. Deliver everything inside the external [`jackin-project/jackin-the-architect`](https://github.com/jackin-project/jackin-the-architect) repository, opt-in and role-scoped, with no change to jackin core.

This item also records the **evaluation** of the candidate tools (`ast-grep`, `codedb`, `fff`, `ast-index`) that produced this stack, so the choice is auditable and the deferred candidates have a documented home. The broad market sweep and the token-economics A/B protocol live in the research dossier [Code-intelligence tools: codedb, fff, CodeGraff, and alternatives](/research/token-optimization/51-code-intelligence-tools/); this roadmap item is the **decision record** and the implementation plan, not a duplicate of that evidence.

## Scope [#scope]

* **the-architect only.** No other role is touched.
* **Opt-in.** The tools exist only inside a container the operator loads; operators who never `jackin load the-architect` see no change.
* **No jackin-core change.** The pilot rides existing extension points: the role `Dockerfile`, the role `preflight` hook, and default-home seeding of the agent home (<RepoFile path="crates/jackin-image/src/derived_image.rs">crates/jackin-image/src/derived\_image.rs</RepoFile>).
* **Reversible.** Reverting the the-architect commit removes every trace.

## Tool evaluation and comparison [#tool-evaluation-and-comparison]

Four candidate tools were compared for the agent code-search/intelligence layer. They are **complementary at the capability level** — each owns a different question — and the only real conflict is at the *instruction* level, where more than one tool ships a default "use me for all search" prompt. The split below is the decision; the routing policy enforces it.

### Candidates and decision [#candidates-and-decision]

| Tool                                                                | Category                       | Mechanism                                                                      | Decision for the-architect                                       |
| ------------------------------------------------------------------- | ------------------------------ | ------------------------------------------------------------------------------ | ---------------------------------------------------------------- |
| [`ast-grep`](https://ast-grep.github.io/)                           | Structural search / rewrite    | tree-sitter AST patterns, deterministic, CLI                                   | **Pilot — default** for syntax-shape queries and codemods        |
| [`rust-analyzer`](https://rust-analyzer.github.io/)                 | Semantic navigation            | LSP, compiler-grade resolution                                                 | **Pilot — default** for definitions / references / types / impls |
| [`fff`](https://github.com/dmtrKovalenko/fff)                       | Resident file / content search | warm in-memory index, frecency + git-aware ranking, MCP                        | **Pilot — default** for file / path / literal-content search     |
| [`codedb`](https://github.com/justrach/codedb)                      | Code-intelligence server       | Zig, index-once in-memory (symbols, outline, callers, deps, task-context), MCP | **A/B evaluation arm** for relationship / task-context queries   |
| [`ast-index`](https://github.com/defendend/Claude-ast-index-search) | Indexed code search            | Rust, SQLite + FTS5, thin shell-out MCP wrapper                                | **Deferred** — redundant with codedb on a Rust workspace         |

### The four jobs [#the-four-jobs]

* **`ast-grep` = syntax shape.** "Find every `match` arm returning `Err($E)`", "rewrite this deprecated call shape", "every `$X.unwrap()`". Deterministic AST matching, no false positives from comments/strings. Not semantic — it cannot resolve types or follow a call graph.
* **`rust-analyzer` = compiler truth.** Definitions, references, types, trait impls. For Rust this is the quality floor, not optional — fuzzy or structural approximation should never override it for correctness.
* **`codedb` = task-shaped relationships.** "Where is this symbol?", "who calls this?", "what depends on this?", "give me context for this task" via `codedb_context`. Index-once, in-memory, no per-query rescan. 21 MCP tools; note it has **no** `implementations` / `hierarchy` / `call_tree` primitive — its relational tools are `codedb_symbol` (definition), `codedb_callers` (call sites), `codedb_deps` (file-level imports).
* **`fff` = fast locate.** Typo-tolerant fuzzy path search, literal/regex/fuzzy grep, frecency + git-status ranking, resident index that beats per-call `rg`/`fd` process spawns in a long-running agent loop. It finds files and lines; it does not explain relationships.

### Routing decision (the real conflict, resolved) [#routing-decision-the-real-conflict-resolved]

The tools do not conflict as binaries. The conflict is that **codedb and fff each ship a default "route all search through me" instruction**, and adopting both verbatim makes the agent churn. The role therefore defines an explicit router instead of accepting either default:

```text
fff           → file / path / literal-content search   (default search tool)
codedb        → symbols, callers, deps, task-context    (code-intelligence, A/B arm)
ast-grep      → structural / AST search + codemods
rust-analyzer → definitions / references / types        (correctness floor)
rg / grep     → plain-text / log / error-string search  (the common case; do not force a structural tool on it)
```

Neither codedb's nor fff's blanket "use me for all search" prompt is adopted as-is. The router is authored once in the shared guidance file (below).

### Verified findings (fact-check, 2026-06-15) [#verified-findings-fact-check-2026-06-15]

These were confirmed against upstream source — they change *how* the tools are installed, not just whether:

* **codedb's official installer writes a default-on, undocumented Claude hook that blocks shell search.** `install/install.sh` installs `~/.claude/hooks/codedb-block-legacy.sh` as a `PreToolUse` hook on `Bash` that `exit 2`-blocks **ten** commands — `grep`, `rg`, `egrep`, `fgrep`, `cat`, `head`, `tail`, `sed`, `awk`, `find` — redirecting them to codedb MCP tools. It is registered unconditionally (no opt-out flag) and is not mentioned in the README. &#x2A;*Implication:** in the-architect, install codedb by binary and register MCP manually; do **not** run the installer's hook registration. That hook would fight `fff`'s file-search role and jackin's own agent harness, and is exactly the "all search → codedb" default the router rejects.
* **codedb telemetry is on by default.** Written to `~/.codedb/telemetry.ndjson` and synced to a remote endpoint; upstream states no source code, file contents, paths, or queries are collected (only aggregate tool-call counts/latency). Opt-out: `CODEDB_NO_TELEMETRY=1` or `--no-telemetry`. &#x2A;*Implication:** set `CODEDB_NO_TELEMETRY=1` in the role for a security-conscious image.
* **`fff` specifics verified.** \~360 bytes/indexed file (\~26 MB resident on a 14k-file repo); `fff-mcp` accepts a positional base path plus real flags `--frecency-db`, `--log-file`, `--no-update-check` (among others); MCP tools are `ffgrep`, `fffind`, `fff-multi-grep`; latest stable `0.9.4`. Base path defaults to cwd, then discovers the git root.
* **`ast-grep` invocation and skill.** Invoke as `ast-grep`, never `sg` (the Debian set-group command). Latest `0.43.0`. The official [`ast-grep/agent-skill`](https://github.com/ast-grep/agent-skill) is a **Claude-only** plugin (`npx skills add ast-grep/agent-skill` or the plugin marketplace) and does **not** auto-trigger — it must be asked for. &#x2A;*Implication:** for `codex`/`amp`/`opencode`/`kimi`, the cross-runtime path is the `ast-grep` CLI on `PATH` plus the shared guidance file, not the skill.
* **`ast-index` is the wrong shape for this repo.** It is Rust + SQLite/FTS5, but its MCP server is a **thin wrapper that shells out to the `ast-index` CLI per call**; it ships an `"ALWAYS use ast-index FIRST"` rule (which would fight the router and must not be adopted); and its genuinely-differentiated features are **mobile/polyglot** (Kotlin/Swift/Dart, `xml-usages`, `storyboard-usages`, `swiftui`, `main-actor`) — low value on a workspace that is overwhelmingly Rust. Its only Rust-relevant extras over codedb are `implementations`, `hierarchy`, recursive `call_tree`, `changed`-symbol, and `api`/`module` reports.

### ast-index verdict [#ast-index-verdict]

`ast-index` is a real, maintained project, but for the-architect it is **mostly redundant with codedb** on a Rust workspace: codedb already covers symbols, outlines, callers, deps, search, and task context. Adding it as a second always-on indexer (plus its own SQLite index lifecycle and a foreground `watch`) is not justified by default. It is **deferred** to a narrowly-routed optional fallback (see [Follow-ups](#follow-ups-deferred)) — to be revisited only for `implementations` / `hierarchy` / recursive `call_tree` / changed-symbol / API-report queries that codedb cannot answer, and only with benchmarks, and never with its "always first" rule.

### Broader alternative sweep [#broader-alternative-sweep]

The full market scan — `codedb`/`fff` benchmarked against `Serena`, `Code Context Engine`, `Augment`, `Sourcegraph`, `Qodo`, `Claude Context`, `CodeGraff`, and others, plus the token-economics equation, local measurements, the validation harness, and the acceptance rule — lives in the research dossier [Code-intelligence tools: codedb, fff, CodeGraff, and alternatives](/research/token-optimization/51-code-intelligence-tools/). That dossier is the evidence and the A/B protocol; this item is the decision and the implementation. The dossier's conclusion (keep the fff pilot; add a codedb A/B arm with a strict bounded-output policy; do not adopt CodeGraff Pro by default) is the basis for the stack above.

## Design [#design]

Design decisions accumulate here as they settle (via `/jackin-dev:brainstorm`) — each is a decision plus a one-line rationale, so the item stays resumable.

### codedb and fff are complementary — adopt both, routed by capability (no A/B bake-off) [#codedb-and-fff-are-complementary--adopt-both-routed-by-capability-no-ab-bake-off]

**Decision.** Ship `fff` and `codedb` together as first-class, permanent tools, not competing A/B arms. The choice rests on an architectural difference, not a measured contest: an A/B picks a winner when two tools do the *same* job; `fff` and `codedb` do *different* jobs, so "which is better" is the wrong question — neither replaces the other.

**Why (architecture, verified 2026-06-15).** `fff` is a purely **lexical** engine: a resident path + content index ranked by frecency (LMDB-backed access history), git-status boost, query combo-learning, and SIMD typo-tolerant fuzzy matching. It has **no** symbol / AST / caller / dependency / outline model — its "definition classifier" is byte-level line tagging (decorated grep), not symbol resolution. `codedb` is a **structural** engine: tree-sitter outlines, an inverted word index, a trigram index, and a file-level dependency graph behind `codedb_symbol` / `codedb_callers` / `codedb_deps` / `codedb_outline` / `codedb_context`. Its content search is exact-word + trigram + regex and its `codedb_find` is fuzzy on *paths* — but it has **no frecency and no git-status ranking**. The two overlap only on the trivial "grep an identifier / find a file by name" case; each owns a layer the other cannot architecturally provide. A dedicated third-party head-to-head reached the same conclusion ("complementary, not competing… both can run as separate MCP servers simultaneously"). Full matrix in the research dossier.

**Routing consequence (sharper than the evaluation section).** `codedb`'s generic search is the *weaker* half of the only overlap (no recency / git ranking), so `codedb` is scoped to **structural** queries and is **not** a general search tool; `fff` owns file/content discovery outright. This supersedes the earlier "A/B evaluation arm" framing for `codedb`.

* `fff` → file/path discovery + literal/identifier content grep (wins the overlap via frecency + git-dirty ranking).
* `codedb` → `codedb_symbol` / `codedb_callers` / `codedb_deps` / `codedb_outline` / `codedb_context` (the structural layer `fff` lacks).
* Do **not** use `codedb_search` / `codedb_find` as the default search — reach for them only when a query needs structural scoping `fff` cannot express.

**Open (next).** Adopting both means two resident MCP servers (`codedb` \~21 tools + `fff` 3 tools): standing per-turn tool-schema rent plus two in-memory indexes. How that is bounded — the role's existing `caveman-shrink` wrapper, the client's tool-search / schema-deferral, or accepting the full schema — is resolved next.

### Bound MCP rent with native tool-search + bounded output — not caveman-shrink [#bound-mcp-rent-with-native-tool-search--bounded-output--not-caveman-shrink]

**Decision.** Do not wrap `codedb`/`fff` in `caveman-shrink`. Bound the two-server rent with the levers that actually target it: Claude Code's native **tool-search** (`defer_loading`, default-on; `ENABLE_TOOL_SEARCH`) for per-turn tool-schema rent, and the **bounded-output guardrail** (instruction/limit on `codedb_tree` / `codedb_snapshot` / `codedb_remote`) for response size. On runtimes without schema deferral (`codex`, `opencode`, `amp`), accept the standing \~21-tool `codedb` rent as a known, measured cost.

**Why (evidence, 2026-06-15).** `caveman-shrink` only rewrites tool *description prose* — it does **not** reduce schema structure, tool count, or response bodies, so it targets neither of the two rent problems better than the levers already in hand. An internet sweep found **zero** real-world pairings of `caveman-shrink` with either `codedb` or `fff`; the repos that use it wrap other servers. It is **v0.1.0, pre-1.0 ("rules may change"), 0 npm dependents** — which conflicts with the role's version-pinning rule. Native tool-search is strictly more powerful for schema rent (defers the whole definition, \~85–95% cut) and is already on for the Claude runtime.

**Note — pre-existing bug to fix separately (the-architect repo, not this PR).** `hooks/preflight.sh` currently registers `caveman-shrink` with **no upstream command** (`claude mcp add caveman-shrink -- npx -y caveman-shrink`) — the exact failure mode of upstream issue #474 (exits 2, "1 MCP server failed" every session); the correct wrapping form sits in a comment directly above it but was never applied (upstream PR #452 drops this auto-registration). It means `caveman-shrink` does nothing useful in the role today. Fix or drop it in the-architect independently of this pilot; do not layer `codedb`/`fff` under it.

### rust-analyzer is agent-facing only where the runtime has native LSP — no bridges [#rust-analyzer-is-agent-facing-only-where-the-runtime-has-native-lsp--no-bridges]

**Decision.** Install `rust-analyzer` on `PATH` and wire it natively per runtime; do **not** add an LSP→MCP bridge (e.g. Serena). Native-LSP support is uneven across the role's runtimes, so rust-analyzer's agent-facing role is graded:

| Runtime                        | Native LSP                          | rust-analyzer for the agent                                                                                                                                                                                                                                                                         |
| ------------------------------ | ----------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `claude`                       | yes (built-in `LSP` tool, v2.0.74+) | **full navigation** — go-to-def / refs / hover / impls / call-hierarchy, via the official `rust-analyzer-lsp@claude-plugins-official` plugin (Anthropic-authored → passes the marketplace allow-list) + the binary on `PATH`. **Already wired in the-architect today** — the pilot leaves it intact |
| `opencode`                     | yes (own JSON-RPC LSP client)       | **diagnostics** (rust-analyzer auto-detected on `PATH` via the `lsp` block in `opencode.json`); navigation exists but is experimental/off by default (`OPENCODE_EXPERIMENTAL_LSP_TOOL`) — left off for the pilot                                                                                    |
| `codex`, `amp`, `kimi`, `grok` | no                                  | **diagnostics only** via one-shot `rust-analyzer diagnostics`; semantic navigation falls back to `codedb` + `ast-grep`                                                                                                                                                                              |

**Why.** Only `claude` (full) and `opencode` (diagnostics) act as native LSP clients; `codex` (#8745 open), `amp` (reads a connected IDE's diagnostics, not a native client), `kimi` (request open), and `grok` (unconfirmed) do not. Native is the operator's stated preference and avoids a bridge — an LSP→MCP bridge would add another resident MCP server (against the rent decision) and, in Serena's case, overlap `codedb`. `rust-analyzer` still earns its place on `PATH` everywhere: Claude drives it natively, OpenCode feeds diagnostics from it, and every runtime can shell `rust-analyzer diagnostics` as a verification gate. Type-correct navigation (through generics / traits / macros) is therefore a Claude bonus; the cross-runtime navigation floor stays `codedb` + `ast-grep`. The Claude plugin is provenance-verified as Anthropic's own (not a third-party relabel), so it adds no new marketplace trust anchor.

### Register MCP servers in preflight, idempotently, for every MCP-capable runtime [#register-mcp-servers-in-preflight-idempotently-for-every-mcp-capable-runtime]

**Decision.** Register both `fff` and `codedb&#x60; from the role's &#x2A;*`preflight`** hook (runs before the agent on every launch), idempotently (`<runtime> mcp get … || <runtime> mcp add …`), at user scope, for every MCP-capable runtime — `claude`, `codex`, `opencode` (and `amp` / `kimi` / `grok` as their MCP support allows). This supersedes the item's earlier `setup_once` + Claude-only sketch.

**Why.** `preflight` is self-healing (re-adds a missing server on any launch) and is where the-architect already wires MCP — so it is also the single place the broken `caveman-shrink` registration gets fixed. Broad runtime scope follows from decision #3: `codex`, `amp`, `kimi`, and `grok` have no native rust-analyzer navigation, so `codedb` is their primary code-intelligence layer and `fff` their primary search — registering for `claude` only would leave them with neither. The per-turn schema rent on the non-deferring runtimes is the known, accepted cost from decision #2 (`fff`'s 3 tools are negligible; `codedb`'s \~21 are what Claude defers via tool-search and the others carry).

### Additive only — do not disturb the existing native-LSP wiring or `~/.claude` state [#additive-only--do-not-disturb-the-existing-native-lsp-wiring-or-claude-state]

**Decision.** Every pilot change is additive. the-architect **already** installs `rust-analyzer` (Dockerfile `rustup component add rust-analyzer`) and **already** declares `rust-analyzer-lsp@claude-plugins-official` in `[claude].plugins`, so Claude Code's native Rust LSP is **already live**. The pilot leaves that wiring **untouched** — re-adding it only risks version churn between the binary and the plugin pin — and cannot disable the native LSP tool anyway: the tool is plugin-presence-gated and on by default, and MCP servers are an independent subsystem (verified). `codedb`/`fff` registration, `CODEDB_NO_TELEMETRY`, skipping codedb's block-legacy hook, and the guidance file are all orthogonal to LSP.

**The real hazard is `~/.claude` state, not LSP.** The caveman installer (already in the image) writes hooks and merges registrations into `~/.claude/settings.json`, and the Dockerfile gates the build on their presence (`test -f …/caveman-*.{sh,js}`). So: the guidance file is written with a **single-file** `COPY … /home/agent/.claude/CLAUDE.md` (no global `CLAUDE.md` exists today → purely additive); the pilot must **never** overwrite `~/.claude/settings.json` or COPY a whole `.claude/` directory — that would wipe caveman's hook registrations and fail the build's smoke checks. (Skipping codedb's installer, decision #2, serves this same end: it rewrites `~/.claude/settings.json`.) MCP registration is added as a new idempotent `register_*` function in `preflight.sh` next to `register_caveman_shrink`, never in the Dockerfile (the claude CLI is injected per container at launch).

### Pin all three tool versions; let Renovate track bumps [#pin-all-three-tool-versions-let-renovate-track-bumps]

**Decision.** Pin `ast-grep` (mise `ast-grep@<ver>`), `fff-mcp` (a pinned release asset or Homebrew formula, not the `curl | bash` latest path), and `codedb` (`codedeebee@<ver>`); keep `--locked` on any `cargo install` fallback. Renovate tracks updates, including codedb's fast (near-daily, alpha) cadence.

**Why.** the-architect's supply-chain rules require pinning + `--locked` + documented trust anchors and forbid floating `curl | bash` / `@latest`. Pinning keeps image builds reproducible; Renovate absorbs the churn so codedb's alpha fixes still land as reviewed bumps rather than silent drift. Both convenience installers float by default (the `fff` installer fetches latest; codedb's `codedeebee` postinstall pulls the latest native binary), so each needs an explicit version pin.

### Resident-index state uses tool defaults in the agent home — not the repo, not the host [#resident-index-state-uses-tool-defaults-in-the-agent-home--not-the-repo-not-the-host]

**Decision.** Let `fff` (frecency DB, logs) and `codedb` (index cache, e.g. `~/.codedb/`) use their default locations in the **agent home**, alongside the runtimes' own state (`~/.claude`, `~/.codex`, `~/.cargo`). Do **not** write index state into the **mounted workspace** (the operator's checkout stays clean) and never to a **host** path (container-only).

**Why.** Third-party tool caches in `$HOME` are tool-owned dotfiles, not jackin-owned container data, so they sit outside the `/jackin/` layout rule's scope (which governs jackin's own dirs and bans FHS roots) — consistent with how `cargo` / `claude` / `codex` already keep state in the home. The guardrails that do bind are met: codedb's project *root* is the mounted workspace (for indexing) while its *cache* stays in the home, and no host write occurs. Implementation note: the resident indexes rebuild on startup (codedb indexes once; fff scans on start), so state need not persist — but if cross-session frecency learning is wanted, ensure the home path is in the default-home seeded/persisted set.

## Tools and exposure [#tools-and-exposure]

| Capability                               | Tool                                 | Exposure                                                                                                                                                                                        |
| ---------------------------------------- | ------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Structural Rust search / refactor / lint | `ast-grep`                           | CLI on `PATH` (invoke as `ast-grep`, never `sg`)                                                                                                                                                |
| Definitions / references / types / impls | `rust-analyzer`                      | native LSP where supported — full nav on `claude` (official `rust-analyzer-lsp` plugin), diagnostics on `opencode`; `rust-analyzer diagnostics` as a verification gate elsewhere. No MCP bridge |
| Test / coverage / lint / format          | `cargo-nextest`, `clippy`, `rustfmt` | CLI (already present in the-architect)                                                                                                                                                          |
| Resident file / text search              | `fff` (`fff-mcp`)                    | MCP, registered at Claude user scope                                                                                                                                                            |
| Symbols / callers / deps / task-context  | `codedb`                             | MCP, registered at user scope — structural-intelligence layer (telemetry off, installer hooks skipped)                                                                                          |

CLI tools sit on `PATH` so every runtime the role supports reaches them with no per-turn schema cost. `fff` and `codedb` are exposed over MCP because their value is a warm resident index that a cold CLI spawn cannot provide. MCP servers serve every MCP-capable runtime (`claude`, `codex`, `opencode`), so they are the right surface for a multi-runtime role; the Claude-only `ast-grep` skill is deliberately not in the pilot (CLI + shared guidance covers all runtimes instead).

## Implementation (the-architect repo) [#implementation-the-architect-repo]

### 1. Install the tools (`Dockerfile`) [#1-install-the-tools-dockerfile]

The-architect already carries the Rust toolchain plus `nextest`/`clippy`/`rustfmt`. Add the missing pieces, preferring prebuilt binaries so image builds stay fast:

```dockerfile
# --- Code-intelligence tooling ---

# rust-analyzer: ALREADY installed in the-architect (this RUN is a no-op there;
# shown for completeness / reuse by other roles). Official LSP as a rustup
# component (no compile). Agent-facing only where the runtime is a native LSP
# client: `claude` drives it for full navigation via the official
# `rust-analyzer-lsp@claude-plugins-official` plugin (already declared in
# jackin.role.toml [claude].plugins; Anthropic-authored, allow-list clean),
# `opencode` auto-detects it on PATH for diagnostics. Every runtime can shell
# `rust-analyzer diagnostics <path>` as a verification gate. No LSP->MCP bridge —
# codex/amp/kimi/grok have no native LSP and fall back to codedb + ast-grep.
# Leave the existing binary + plugin wiring intact; do not re-pin or duplicate.
RUN rustup component add rust-analyzer

# ast-grep: structural search/refactor as a CLI. No Debian apt package exists,
# so install via mise — the registry entry resolves to the aqua backend and
# pulls a prebuilt, SLSA-verified binary (no source compile, layer-cached,
# pin a version with `ast-grep@<ver>` for reproducibility). Invoke as
# `ast-grep`, never `sg`, the system set-group command on Debian.
RUN mise use --global ast-grep

# fff: resident file-search MCP server. No Debian apt package; install via the
# upstream installer, which fetches the latest prebuilt binary. Docker caches
# this layer — pass `--rebuild` (or bust the layer) to pull a newer fff.
RUN curl -L https://dmtrkovalenko.dev/install-fff-mcp.sh | bash

# codedb: code-intelligence MCP server — A/B evaluation arm. Install the binary
# only; do NOT run codedb's official curl|bash installer — it writes a default-on,
# undocumented PreToolUse hook that BLOCKS grep/rg/cat/find (10 commands) and
# auto-registers clients. The npm launcher package is `codedeebee`; the CLI it
# installs is `codedb`. Telemetry is on by default — disable it role-wide below.
RUN npm install -g codedeebee && codedb --version
ENV CODEDB_NO_TELEMETRY=1
```

`cargo install ast-grep --locked` is an acceptable fallback for `ast-grep` since the toolchain is present, but it compiles from source; prefer the prebuilt path. Per the-architect's supply-chain rules, pin tool versions and keep `--locked` on any `cargo install`.

### 2. Register the MCP servers (preflight hook, idempotent, per runtime) [#2-register-the-mcp-servers-preflight-hook-idempotent-per-runtime]

Register `fff` and `codedb` at **user scope** (they write to each runtime's own client config — e.g. `~/.claude.json` — never to a project `.mcp.json&#x60; in the operator's mounted repo) from the role's &#x2A;*`preflight`** hook. preflight runs before the agent on every launch, so an idempotent `get || add` self-heals if a server is missing — and the-architect already ships a `preflight.sh` (where the existing `caveman-shrink` registration lives and must be fixed). The adds are additive and idempotent, so they never disturb existing MCP servers, the native LSP tool, or other registrations. Register for whichever runtime is launching (`JACKIN_AGENT`):

```bash title="hooks/preflight.sh (excerpt — add alongside the existing entries)"
root="${JACKIN_WORKSPACE:-$PWD}"   # codedb project root: the mounted workspace, never the agent home

# Idempotent, per active runtime. codedb telemetry is off via CODEDB_NO_TELEMETRY
# (set in the image); never run codedb's installer block-legacy hook — the router
# keeps fff/rg for search and codedb for structure.
case "${JACKIN_AGENT:-claude}" in
  claude)
    command -v fff-mcp >/dev/null && { claude mcp get fff    >/dev/null 2>&1 || claude mcp add -s user fff    -- fff-mcp; }
    command -v codedb  >/dev/null && { claude mcp get codedb >/dev/null 2>&1 || claude mcp add -s user codedb -- codedb mcp "$root"; }
    ;;
  codex)
    command -v fff-mcp >/dev/null && { codex mcp get fff    >/dev/null 2>&1 || codex mcp add fff    -- fff-mcp; }
    command -v codedb  >/dev/null && { codex mcp get codedb >/dev/null 2>&1 || codex mcp add codedb -- codedb mcp "$root"; }
    ;;
  opencode)
    : # OpenCode reads MCP from the `mcp` block in opencode.json (baked into the image), not a per-launch CLI add
    ;;
esac
```

The manifest already declares the preflight hook (no schema change — `[hooks]` exists); this adds to the existing script rather than introducing a new one:

```toml title="jackin.role.toml"
[hooks]
preflight = "hooks/preflight.sh"
```

### 3. Install the guidance once, copy it into every runtime [#3-install-the-guidance-once-copy-it-into-every-runtime]

The agent only uses the tools if it is told they exist and which one owns which job. Author that guidance — including the router — as **one source file** in the role repo and copy it to each supported runtime's global-instructions path at build time. The content is written once, never hand-duplicated.

Copy rather than symlink: Codex does not reliably follow symlinked config files ([codex#11314](https://github.com/openai/codex/issues/11314), [codex#8943](https://github.com/openai/codex/issues/8943)), and coding agents increasingly refuse to follow symlinks in config directories as a sandbox-escape defense. The file is static in the image, so a symlink would buy nothing over a copy.

Write **only** these single files. Never overwrite `~/.claude/settings.json` or COPY the `.claude/` directory wholesale — the caveman installer's hooks and the build's `test -f` smoke checks depend on that state (this is also why codedb's installer is skipped). No global `~/.claude/CLAUDE.md` exists in the image today, so the single-file copy is purely additive.

```dockerfile
# One source file → each runtime's global-instructions path. Real files (not
# symlinks) so every runtime reads them and the captured home dirs survive
# default-home seeding cleanly.
COPY --chown=agent:agent code-intelligence.md /home/agent/.config/AGENTS.md
COPY --chown=agent:agent code-intelligence.md /home/agent/.claude/CLAUDE.md
COPY --chown=agent:agent code-intelligence.md /home/agent/.codex/AGENTS.md
COPY --chown=agent:agent code-intelligence.md /home/agent/.config/opencode/AGENTS.md
```

Global-instructions path per runtime — each is a copy of the one source file:

| Runtime    | Global path                    | Wiring                                                                       |
| ---------- | ------------------------------ | ---------------------------------------------------------------------------- |
| `amp`      | `~/.config/AGENTS.md`          | copy (read natively)                                                         |
| `claude`   | `~/.claude/CLAUDE.md`          | copy                                                                         |
| `codex`    | `~/.codex/AGENTS.md`           | copy (native `AGENTS.md`)                                                    |
| `opencode` | `~/.config/opencode/AGENTS.md` | copy (also reads `~/.claude/CLAUDE.md`)                                      |
| `kimi`     | —                              | Kimi Code auto-loads no home-level rules file yet — tracked under follow-ups |

The single source content (markdown, agent-agnostic):

```markdown title="code-intelligence.md (one source, copied to each path)"
# Code-intelligence tooling (the-architect)

This container ships dedicated code-intelligence tools. Reach for the right one by job; do not route every search through one tool.

## Routing — which tool for which question
- **File / path / literal-content search → fff.** Use `fffind` for paths/filenames (including approximate ones) and `ffgrep` for literal/identifier/error-string content. `fff-multi-grep` for several name variants in one call (e.g. snake_case + PascalCase). After one or two useful searches, read the best match instead of searching again.
- **Symbols, callers, dependencies, task context → codedb** (when connected). `codedb_context` to orient on an unfamiliar task, `codedb_symbol` for a definition, `codedb_callers` before changing a public API, `codedb_deps` for import impact, `codedb_outline` before reading a large file. Prefer bounded reads; do not dump full trees or snapshots. Native edit tools first — `codedb_edit` is a fallback only.
- **Syntax-shape search and rewrites → ast-grep** (see below).
- **Definitions / references / types / trait impls → rust-analyzer** (the correctness floor for Rust).
- **Plain text (logs, comments, config keys, messages) → rg / grep.** That is the common case.

Do not run an `fff` search and a `codedb` search for the same literal unless the first is clearly insufficient. Do not use `codedb_context` merely to find a filename. Do not use `fff` for caller/dependency questions. Do not use `ast-grep` for plain-text search.

## Structural search & refactor — ast-grep
`ast-grep` is installed. Invoke it as `ast-grep`, never `sg` (`sg` is the system set-group command on Debian).

Reach for ast-grep first whenever the target is a *syntax structure*, not literal text: functions, `impl` blocks, trait impls, macro definitions/calls, `async fn`, derives, match arms, or specific call shapes (`$X.unwrap()`, `$X.clone()`).

- Search: `ast-grep --lang rust --pattern '<pattern>'` (short: `ast-grep -l rust -p '<pattern>'`).
- Rewrite: add `--rewrite '<replacement>'`.
- Lint/scan: `ast-grep scan` against rules in `sgconfig.yml`.
- Metavariables: `$X` matches one node, `$$$ARGS` matches a list. Every `.unwrap()`: `ast-grep -l rust -p '$X.unwrap()'`.

## Semantic navigation — rust-analyzer (runtime-dependent)
On `claude`, rust-analyzer powers go-to-definition / find-references / hover / implementations / call-hierarchy through the native LSP tool — prefer it for type-correct symbol resolution (it follows generics, traits, and macros where text or AST matching cannot). On `opencode` it provides diagnostics. On the other runtimes (`codex`, `amp`, `kimi`, `grok`) there is no native LSP: use `codedb` (symbols / callers / deps) and `ast-grep` (structure) for navigation, and treat `rust-analyzer diagnostics` only as a check. Resolve symbols rather than guessing from text matches.

## Verify
- Tests: `cargo nextest run`
- Lints: `cargo clippy --all-targets --all-features -- -D warnings`
- Format: `cargo fmt --check`
```

## How the agent learns the tools [#how-the-agent-learns-the-tools]

1. The the-architect `Dockerfile` copies the source guidance to each runtime's global-instructions path as a real file (agent layer, on top of the construct).
2. The copies under `~/.claude/` and `~/.codex/` are captured by the derived layer's default-home step (see <RepoFile path="crates/jackin-image/src/derived_image.rs">crates/jackin-image/src/derived\_image.rs</RepoFile>) and reseeded on first boot; `~/.config/AGENTS.md` and `~/.config/opencode/AGENTS.md` are baked into the image and present at runtime.
3. Each runtime reads its global-instructions path at startup, so the guidance — including the router — is in context for every session regardless of which repo the operator mounted.

Because the guidance lives in the agent's home, it applies to every workspace the-architect runs against and never mutates a host repository.

## Verification [#verification]

Smoke-test from a jackin checkout (always `--debug`; share the printed run id if something misbehaves):

```bash
cargo run --bin jackin -- load the-architect . --debug
```

Inside the container, confirm each surface:

* `ast-grep -l rust -p '$X.unwrap()'` returns structural matches.
* `command -v rust-analyzer` resolves; `Claude Code` go-to-definition works on a symbol.
* `claude mcp list` shows `fff` and `codedb`; ask the agent to "use fff" for a file search and exercise a `codedb_context` query.
* `codedb status` reports the **mounted workspace** as the project root (not `~`), and no `~/.claude/hooks/codedb-block-legacy.sh` exists (installer hook was not run).
* `env | grep CODEDB_NO_TELEMETRY` shows `1`.
* `cargo nextest run` runs the suite.
* `sha256sum ~/.config/AGENTS.md ~/.claude/CLAUDE.md ~/.codex/AGENTS.md ~/.config/opencode/AGENTS.md` — all four hashes match (one source).
* When asked to find all trait impls or `.unwrap()` calls, the agent reaches for `ast-grep` rather than `rg`; for "who calls X" it reaches for `codedb_callers`, not a text grep. Repeat the launch with `--codex` / `--amp` / `--opencode` to confirm the same guidance and router are in context for each runtime.

## Success criteria [#success-criteria]

The pilot succeeds when, across a handful of representative the-architect sessions:

* the agent invokes `ast-grep` for structural queries, `rust-analyzer` for symbol resolution, `fff` for file/content lookup, and `codedb` for caller/dependency/context questions — instead of ad-hoc text search;
* the agent honors the router — it does not double-search the same literal across `fff` and `codedb`, and it does not route relationship questions to `fff` or plain-text questions to a structural tool;
* code lookups cost fewer tokens than repeated `rg` plus file reads, net of the MCP schema carried each turn;
* fewer edits land on wrong paths because symbols were resolved rather than guessed;
* the same guidance is confirmed in context across `claude`, `codex`, `amp`, and `opencode` from the one source file;
* `fff`'s resident index measurably beats the construct's ripgrep on this repo — or the pilot concludes it does not, and `fff` is dropped while the CLI and LSP tools stay;
* `codedb` earns its standing MCP cost: it is used for structural queries (symbols / callers / deps / outline / context), returns bounded task-shaped output, and is not used as a general search tool (that is `fff`'s job). Token/latency measurement from the dossier's [validation harness](/research/token-optimization/51-code-intelligence-tools/) is still worth running to size the win — but adoption is settled on architectural grounds (complementary layers), not gated on an `fff`-vs-`codedb` bake-off.

## Follow-ups (deferred) [#follow-ups-deferred]

* **Kimi Code global rules.** [Kimi Code](https://github.com/MoonshotAI/kimi-code) auto-loads no home-level rules file today. Copy the source guidance into Kimi's home (config dir `~/.kimi-code/`) once Kimi Code supports a global rules file ([upstream feature request](https://github.com/MoonshotAI/kimi-cli/issues/2152)).
* **codedb output guardrail.** Add a hook (or instruction) requiring `limit` / `prefix` / compact-read on `codedb_tree`, `codedb_snapshot`, and `codedb_remote action=tree` so a code-intelligence call never dumps more than native tools would. This is the dossier's bounded-output policy made concrete, and matters more now that `codedb` is a permanent server rather than a trial arm.
* **ast-index as a narrow fallback.** Revisit [`ast-index`](https://github.com/defendend/Claude-ast-index-search) **only** if codedb proves insufficient for `implementations` / `hierarchy` / recursive `call_tree` / changed-symbol / API-or-module reports, and only with a benchmark on real the-architect tasks. If adopted, install from a pinned tag (`cargo build --locked --release`), set an explicit `AST_INDEX_ROOT`, do not run `ast-index watch` at role startup by default, and **do not** copy its "ALWAYS use ast-index FIRST" rule — route it narrowly under the existing router instead. It is otherwise redundant with codedb on a Rust workspace and its mobile/polyglot strengths do not apply here.
* **Serena as an alternative intelligence arm.** The dossier flags [`Serena`](https://github.com/oraios/serena) as the strongest local open-source semantic-navigation competitor to codedb. If the codedb arm underperforms, evaluate Serena (language-server-backed) on the same harness before concluding the intelligence layer has no token win.
* **ast-grep lint gate.** Add `sgconfig.yml` plus `.ast-grep/rules/*.yml` (flag `todo!()`/`unimplemented!()`, audit `.unwrap()`) and run `ast-grep scan` in CI. Independent of the agent-guidance pilot; can land separately.
* **Declarative MCP servers in the manifest.** A first-class `[claude].mcp_servers` (and per-runtime equivalents) field in `jackin.role.toml` would register MCP servers without a hook. This is a versioned-schema change (one `CURRENT_MANIFEST_VERSION` bump with migration and fixtures) and is only worth designing if the pilot proves MCP servers worth generalizing.
* **Skill-based guidance.** Deliver the guidance as a `Claude Code` skill loaded on demand instead of always-on memory, cutting standing memory cost. (The official `ast-grep/agent-skill` is Claude-only; a skill path would need per-runtime equivalents for `codex`/`amp`.)
* **Generalize to a shared layer.** Move the CLI tools (`ast-grep`, `rust-analyzer`) and the source-copy wiring into a shared base so other Rust roles inherit them, keeping `fff`/`codedb`/MCP opt-in.

## References [#references]

* [AGENTS.md](https://agents.md/) — the cross-tool agent-instructions standard, read globally by Codex at `~/.codex/AGENTS.md`, OpenCode at `~/.config/opencode/AGENTS.md`, and Amp at `~/.config/AGENTS.md`. Claude Code reads `~/.claude/CLAUDE.md`.
* [Code-intelligence tools: codedb, fff, CodeGraff, and alternatives](/research/token-optimization/51-code-intelligence-tools/) — the research dossier behind this decision: market sweep, token economics, local measurements, validation harness, and acceptance rule.
* [ast-grep](https://ast-grep.github.io/) — structural search/rewrite CLI (no Debian apt package; installed via mise). Official Claude skill: [ast-grep/agent-skill](https://github.com/ast-grep/agent-skill) (Claude-only).
* [rust-analyzer](https://rust-analyzer.github.io/) — the Rust language server (rustup component).
* [fff](https://github.com/dmtrKovalenko/fff) — resident file-search toolkit for AI agents with an MCP server.
* [codedb](https://github.com/justrach/codedb) — code-intelligence server and MCP toolset (telemetry on by default — `CODEDB_NO_TELEMETRY=1` to disable; install the binary only and skip the installer's block-legacy hook).
* [ast-index](https://github.com/defendend/Claude-ast-index-search) — SQLite/FTS5 indexed code search with a thin shell-out MCP wrapper; deferred (mobile/polyglot focus, redundant with codedb on Rust).
* [Creating a Role](/developing/creating-roles/), [Role Manifest](/developing/role-manifest/), [The Construct Image](/developing/construct-image/) — the role extension points this pilot uses.