jackin'
ResearchToken Optimization Research

52 - Qdrant and vector databases for agent context

52 - Qdrant and vector databases for agent context

This addendum answers the follow-up question: does a vector database like Qdrant add meaningful token-usage value on top of the planned fff + codedb code-intelligence stack?

TL;DR

  • Qdrant is not a replacement for fff or codedb. fff finds files, paths, and text quickly; codedb answers code-structure questions; Qdrant stores vectors and retrieves semantically similar chunks or memories.
  • A vector DB can save tokens only through retrieval selectivity. It saves when it finds 3-5 compact, relevant snippets that prevent reading many files or docs. It loses when it adds embedding cost, stale chunks, MCP schema overhead, and extra retrieved text on top of normal search.
  • The dossier already covered the category, but not Qdrant specifically. File 14 called semantic/RAG repo indexing the largest evidence gap; file 19 rejected semantic response caching for coding agents; file 51 compared purpose-built code context engines. This file adds Qdrant as a backend choice.
  • Best current recommendation: do not add raw Qdrant as a default third tool. Pilot it only as an optional semantic-memory/RAG arm against fff + codedb.
  • If a vector layer is needed, prefer a code-aware layer first. Code Context Engine or Claude Context already include chunking, hybrid retrieval, compression, and agent wiring. Qdrant is the lower-level backend you choose when you want to build or control that layer yourself.

Is Qdrant already covered?

Partially.

Existing fileWhat it already saysGap this file closes
14-retrieval-memory-and-state.mdSemantic/RAG repo indexes may improve accuracy, but token savings vs agentic grep/read are unmeasured; validation protocol required.Names Qdrant and scores whether it should be the backend for that protocol.
19-infrastructure-level.mdQdrant appears as a possible semantic-cache backend in LiteLLM; semantic response caching is a poor fit for coding agents.Separates Qdrant-as-vector-store from semantic whole-response caching.
51-code-intelligence-tools.mdCompares codedb, fff, CodeGraff, Serena, Code Context Engine, Claude Context, Sourcegraph, Qodo, and others.Adds the raw vector database layer beneath Code Context Engine / Claude Context.

So Qdrant is not missing as a concept, but it was missing as a concrete candidate for the current fff + codedb plan.

What Qdrant actually provides

Qdrant is a Rust vector database and vector search engine. The upstream README positions it as a production service for storing vectors plus payloads and then searching by vector similarity. The relevant capabilities for AI-agent context are:

  • dense vector search for semantic similarity;
  • sparse vector search for keyword/full-text-style retrieval;
  • multivector search for late-interaction models such as ColBERT;
  • JSON payload filtering, including keyword, numeric, geo, and boolean clauses;
  • hybrid queries that fuse dense and sparse search with RRF or DBSF;
  • quantization and on-disk storage to reduce memory pressure;
  • distributed deployment for scale;
  • an official MCP server with qdrant-store and qdrant-find;
  • local or server deployment, including QDRANT_LOCAL_PATH in the MCP server.

That is infrastructure. It does not automatically provide:

  • AST-aware code chunking;
  • symbol/caller/dependency graphs;
  • file outlines;
  • edit-safe code spans;
  • index freshness tied to git edits;
  • token-optimized snippet compression;
  • source-of-truth verification before patching.

Those must come from a layer above Qdrant, such as Code Context Engine, Claude Context, a custom indexer, or a future jackin' role extension.

Qdrant vs the tools already considered

Tool/layerWhat it is best atWhat it is notToken-saving fit
fffFast file/path/content search with a warm resident index, frecency, git annotations, fuzzy fallback.Semantic memory, symbols, callers, deps.Keep. It reduces dead-end search loops and latency.
codedbCode intelligence: outline, symbol, search, callers, deps, compact reads, task context.Semantic natural-language memory over docs/examples unless those are indexed as code/text.Keep as the primary open-source code-intelligence candidate.
Qdrant MCP serverGeneric semantic memory: store snippets/docs/decisions and retrieve by natural language.Code structure, exact references, edits, safe spans.Optional only. Useful for docs/examples/decisions, not as the core code-navigation layer.
Raw Qdrant custom indexBuilding your own vector/hybrid backend with full control over metadata, filtering, storage, and model choice.Turnkey agent behavior. You must build chunking, ranking, freshness, compression, and instructions.Good backend if jackin' builds its own vector-context service; overkill as a default role dependency.
Code Context EngineLocal code-aware retrieval with AST chunks, vector+BM25, graph expansion, compression, memory, and claimed token accounting.Mature production track record; headline benchmark uses full-file baseline.Better first benchmark than raw Qdrant if the objective is token reduction.
Claude ContextMCP semantic code search using Milvus/Zilliz with hybrid retrieval and agent setup.Exact symbol graph; adds cloud/vector DB dependency unless self-hosting Milvus.Good comparison arm for semantic RAG; not a replacement for codedb.
SerenaLocal semantic navigation through language-server semantics.Vector memory over docs/examples.Better than Qdrant for symbol/navigation tasks.
Sourcegraph / Qodo / AugmentEnterprise or commercial codebase context and multi-repo impact analysis.Lightweight local role default.Evaluate only as explicit external-agent/context-engine experiments.

Vector database comparison for this purpose

The question is not "which vector DB is best in abstract?" It is "which backend would make sense for AI coding agents trying to spend fewer tokens?"

BackendFit for jackin' agent token optimizationWhy
QdrantBest raw open-source backend candidate if jackin' builds its own semantic memory/context service.Strong local/server options, hybrid dense+sparse retrieval, payload filters, quantization, Rust implementation, official MCP server.
Milvus / ZillizBest if adopting Claude Context or large-scale managed vector infrastructure.Claude Context already uses Milvus/Zilliz; Milvus is built for large-scale distributed vector search and hybrid/full-text search.
WeaviateStrong if the goal shifts toward managed agent memory / agentic search.Mature vector DB with cloud, MCP/server ecosystem, and explicit agent-memory/product surface. Heavier than needed for a local role.
ChromaGood prototype/local RAG store.Already appears indirectly in the dossier through context-rot evidence; less compelling as the production backend for jackin' role work.
LanceDBGood embedded/local columnar vector store when you want a library-style deployment.Less direct MCP/code-agent story in the current comparison.
VespaStrong when vector search is only one part of a larger hybrid search/ranking engine.Better fit for production search stacks with lexical retrieval, tensors, reranking, and serving-time ML; too heavy for a default local role.
TurbopufferStrong managed/serverless candidate when cloud cost and scale dominate.Proprietary cloud service; useful comparison arm for production RAG/search, not a local jackin' default.
pgvectorGood when Postgres is already the system of record.jackin' does not currently need a Postgres dependency just to improve coding-agent search.
PineconeStrong managed vector service.Adds cloud/vendor dependency without a code-agent-specific advantage over Qdrant/Milvus-based local options.
FAISS / hnswlibGood vector indexes inside an application.Not a persistence, metadata, filtering, ops, or MCP layer by itself.

Verdict: if the project wants a raw vector DB backend, Qdrant is the best fit among open-source options for a jackin' role pilot. But the best solution for token usage is probably not raw Qdrant; it is a code-aware retrieval layer that may use Qdrant underneath.

Benchmark evidence: does anyone prove they beat Qdrant?

No public source found proves that any raw vector database is significantly better than Qdrant for jackin's actual metric: fewer total tokens per solved coding-agent task on top of fff + codedb. The available evidence measures database performance, retrieval quality, or code-context-tool compression, not the full agent loop.

EvidenceWhat it supportsWhat it does not prove
Qdrant benchmarksQdrant reports strong open-source DB-level results: high RPS, low latency, comparable precision, same-machine methodology, and open-sourced benchmark code.Vendor-run; Qdrant explicitly says it is probably biased. It measures vector DB performance, not agent token savings.
VectorDBBench / Zilliz leaderboardMilvus/Zilliz can beat Qdrant Cloud on some cloud cost/performance leaderboards. The 1M Cohere result shown on 2026-06-13 has ZillizCloud at p99 2 ms / 13,316 QPS / 0.9383 recall versus QdrantCloud at p99 6.4 ms / 1,242 QPS / 0.9474 recall; its streaming chart also favors ZillizCloud on QPS.Zilliz/Milvus-affiliated; recall is not identical; it is a DB benchmark, not a coding-agent token benchmark. It says "Milvus may be better for this workload," not "Milvus saves jackin' tokens."
RAGPerfAcademic end-to-end RAG benchmarking exists and explicitly treats vector DB choice as one component among embedding, indexing, retrieval, reranking, and generation. It supports LanceDB, Milvus, Qdrant, Chroma, and Elasticsearch and measures throughput plus quality metrics.It is a framework/paper, not a published jackin-style result proving one DB dominates for code-agent token economy.
2026 HPC VDB studyIndependent academic work compares Qdrant, Milvus, and Weaviate at large HPC scale and shows workload characteristics, segmentation, recall-cost trade-offs, and scaling limits dominate.It is not a local coding-agent workload and does not crown a universal winner.
Code Context Engine / Claude ContextThese are stronger evidence for token reduction because they include AST chunking, hybrid retrieval, graph/context logic, compression, freshness, and agent wiring. CCE claims 94% retrieval savings against full-file reads; Claude Context claims about 40% token reduction at equivalent retrieval quality.They are not raw vector DB comparisons. Their baselines are not fff + codedb, so they still need a local A/B.

Independent algorithm benchmarks (ANN-Benchmarks, big-ann/NeurIPS) reinforce this: they rank algorithms, not products, and crown no consumer database; Qdrant sits on the recall-vs-latency Pareto frontier, and the vendor benchmarks mutually contradict each other — each wins the axis it chose (usually throughput) while the same data shows Qdrant winning tail latency, which is itself proof there is no decisive winner.

So the verified answer is:

  • Better raw vector DB for high-scale managed throughput/cost: possibly Milvus/Zilliz, and possibly Turbopuffer or Pinecone depending on cloud workload.
  • Better raw vector DB for embedded/local library ergonomics: possibly LanceDB or Chroma.
  • Better search platform for production hybrid ranking: possibly Vespa.
  • Better solution for coding-agent token savings: not a raw DB; benchmark Code Context Engine or Claude Context as code-aware layers.
  • Better default for jackin' right now: no vector DB by default. Keep fff + codedb; add a vector layer only as an opt-in semantic-memory/RAG arm.

Where Qdrant would add value

Qdrant makes sense when the agent needs semantic recall that fff and codedb do not naturally cover:

  • "Find code similar to this behavior, even if names differ."
  • "Find examples of how we use this third-party library."
  • "Find the design decision we made last week."
  • "Find the docs snippet that explains this config."
  • "Find prior implementation patterns across many repos."
  • "Retrieve accepted code snippets only, then keep generated code consistent."

The Qdrant MCP blog uses exactly this shape: a semantic memory layer stores code snippets, documentation, and implementation details, then qdrant-find retrieves relevant context before the assistant generates code. That can improve relevance and avoid reading irrelevant documentation.

Why the docs corpus is the good case

This distinction matters enough to preserve in the research because otherwise the recommendation is easy to misread as "vector search never helps." The more precise claim is: vector search is a weak default for exact code navigation, but a plausible speedup for semantic documentation and decision recall.

The current repo is large enough for that distinction to matter. A local count on 2026-06-13 found:

rg --files docs token-optimization | wc -l                => 383 files
rg --files docs token-optimization | rg '\.(md|mdx)$' | wc -l => 234 Markdown/MDX files

That is ≈900k–1M tokens of prose (estimate) — far too large to load wholesale — a substantial human-language surface: user docs, reference pages, error pages, topic rules, research files, and decision-like conclusions. For questions such as "where do we document the TUI modal rule?", "what is the host-write policy?", or "what did we conclude about Qdrant versus codedb?", the agent may not know the exact file name or wording. A bounded hybrid vector index can cut the search path from repeated rg attempts plus several file reads to a small set of path:line snippets.

This is mostly a wall-clock and tool-loop speedup claim, not automatically a token-savings claim. It becomes a token-saving claim only when the retrieval layer returns compact pointers and snippets:

good: 3-5 path:line hits + short excerpts + normal file-read verification
bad: 10-20 large chunks pasted into the prompt before every task

The recommended policy is therefore:

  • use vector or hybrid search for docs, examples, decisions, tickets, logs, and architecture memory when exact keywords are uncertain;
  • prefer hybrid retrieval, not pure vector: BM25/keyword + vector + metadata filters, because commands, config keys, error codes, and product terms are often literal;
  • cap output to 3-5 compact results and require native file reads before citing or editing;
  • do not route exact path, symbol, caller, dependency, or patching questions through the vector index.

This makes the vector layer useful as a documentation accelerator without promoting it to the primary code-navigation layer.

Where Qdrant does not add value

Qdrant should not be used when a precise code tool already answers the question:

  • "Where is this function defined?" Use rust-analyzer/Serena/codedb.
  • "Who calls this function?" Use codedb or an LSP/reference tool.
  • "Which files mention this string?" Use fff or rg.
  • "What is the outline of this file?" Use codedb.
  • "Patch this function." Use native file-edit tools after exact span lookup.

For these tasks, Qdrant adds a semantic approximation layer where an exact structural or lexical tool is cheaper and safer.

Token economics

Qdrant never saves tokens merely by existing. It saves tokens only if:

tokens avoided by fewer file/doc reads
  > retrieved snippets
  + embedding/indexing cost
  + MCP schema/tool overhead
  + stale-index recovery
  + extra verification reads

Expected positive cases:

  • large docs or examples where semantic query returns a few snippets instead of reading many files;
  • repeated architecture/design recall across sessions;
  • code pattern discovery when names are unknown;
  • multi-repo or cross-project memory where grep in one checkout is insufficient.

Expected negative or neutral cases:

  • normal single-repo Rust edits where fff + codedb + rust-analyzer already find the exact path/symbol quickly;
  • active editing where vector chunks go stale;
  • broad semantic queries that return big chunks;
  • agent instructions that say "always search Qdrant" before every code change;
  • whole-response semantic caching, which file 19 already killed for coding agents.

Do not add Qdrant to every role. Add a separate opt-in vector-context pilot for the-architect or a research role.

Minimum safe shape:

rust-analyzer + ast-grep
+ codedb
+ fff
+ optional qdrant-find only for semantic docs/examples/decisions

Implementation constraints:

  • Run Qdrant and the MCP server inside the role container, not on the host.
  • Keep jackin-owned state under /jackin/state/qdrant/ or a role-owned equivalent, not /tmp, /var, or the operator's home.
  • Register MCP in the container's user scope only.
  • Start read-only: expose qdrant-find; disable or strictly gate qdrant-store until storage policy is clear.
  • Store pointers first: path, line_start, line_end, commit, symbol, kind, and a short snippet. Avoid returning large chunks by default.
  • Limit results to 3-5 unless the agent explicitly asks for more.
  • Include a freshness check: project root, indexed commit/hash, changed-file status, and last-index time.
  • Use hybrid retrieval if available: dense vector + sparse/keyword + metadata filters. Pure vector search is too fuzzy for code.
  • Require the agent to verify with native file reads before editing.

Example Claude Code registration, adapted from Qdrant's MCP docs:

claude mcp add qdrant-code-search \
  -e QDRANT_URL="http://localhost:6333" \
  -e COLLECTION_NAME="code-repository" \
  -e EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2" \
  -e TOOL_FIND_DESCRIPTION="Use for semantic search over approved docs, examples, prior decisions, and code patterns when exact path/symbol names are unknown. Return compact path:line pointers and snippets; verify with native reads before editing." \
  -e TOOL_STORE_DESCRIPTION="Store only operator-approved snippets or accepted implementation decisions with metadata. Do not store generated code automatically." \
  -- uvx mcp-server-qdrant

For a jackin' role, prefer the same command inside a setup hook and point Qdrant storage at role/container state. Do not run the upstream registration against the operator's host config without explicit opt-in.

Best-solution ranking after adding Qdrant

RankStackUse whenVerdict
1rust-analyzer + ast-grep + codedb + fffNormal jackin' Rust/code work.Best default. No vector DB needed.
2Add Code Context Engine as a benchmark armYou want a local code-aware token-savings experiment with published numbers.Better than raw Qdrant for proving token savings, but benchmark baseline is generous.
3Add Qdrant MCP as semantic memory/docs/examplesYou have lots of examples/docs/decisions and exact search is not enough.Useful optional layer; not default.
4Build custom Qdrant-backed jackin' context serviceYou need full control over local vector storage, metadata, freshness, and retrieval policy.Potentially strong, but it is a product project, not a quick tool install.
5Claude Context / Milvus, Sourcegraph, Qodo, AugmentYou accept external services or heavier enterprise tooling.Better for broad enterprise context than for default local jackin' roles.

Validation protocol

Add Qdrant only if it beats the current plan in a realistic harness.

Arms:

ArmTools
Nativerg, normal file reads, native edits
Plannedfff + codedb + rust-analyzer/ast-grep
Qdrant-genericPlanned stack + Qdrant MCP qdrant-find over docs/examples/decisions
Qdrant-customPlanned stack + custom Qdrant-backed code chunks with metadata and freshness
CCECode Context Engine
Claude ContextClaude Context / Milvus or Zilliz

Task set:

  • 10 exact code-navigation tasks, where vector search should not help;
  • 10 semantic "find similar pattern" tasks, where vector search might help;
  • 10 docs/examples/API usage tasks, where vector search should help;
  • 5 stale-index/edit-after-retrieval tasks;
  • 5 multi-repo or cross-session decision recall tasks, if available.

Metrics:

  • solved task rate;
  • correct target file top-1 and top-5;
  • total input/output/cache tokens per solved task;
  • tool calls and tool-result tokens;
  • number of verification reads after retrieval;
  • index build time and incremental update time;
  • embedding/model cost;
  • stale or misleading retrieval incidents;
  • host/container state written.

Acceptance:

Adopt Qdrant only if:
  solved_task_rate >= planned stack
  total tokens per solved task <= planned stack by at least 20%
  stale/misleading retrieval incidents do not increase
  host writes are zero unless explicitly opted in
  exact symbol tasks still use codedb/LSP, not Qdrant

If Qdrant wins only on docs/examples/decisions, keep it scoped there. Do not promote it to general code search.

Bottom line

Qdrant is a good vector database and a credible backend for semantic memory/RAG. It should not replace fff or codedb, and it should not be added as a default third tool merely because vector databases are popular.

For jackin', the practical recommendation is:

Default code role:
  rust-analyzer + ast-grep + codedb + fff

Optional semantic-memory pilot:
  add Qdrant only for docs/examples/decisions/pattern recall

Token-saving benchmark:
  compare Qdrant against Code Context Engine and Claude Context

If the pilot proves a 20%+ token reduction per solved task over fff + codedb with equal quality, Qdrant becomes a useful optional role component. Until then, it is a backend candidate, not part of the default stack.

Source ledger

On this page