CI/CD Speed Roadmap

Status: Open -- run analysis captured; implementation roadmap prioritized for wall-clock impact, with lint-lane de-serialization, package-matrix collapse, preview/CI overlap, and a warm persistent-runner lane leading the order. Phase 9 steps 1, 2, and 4 are implemented in this branch: check-all-features is removed, clippy owns the all-features compile gate while check-default owns the default-feature compile gate, and the measured GitHub-hosted CI required gate moved from about 12m33s on run 27937532691 to about 10m16s on run 27961783994. The Docker E2E lane now lives inside the reusable nextest workflow, runs beside the package shards, and is path-routed so unrelated Rust package changes do not force the real Docker lane. Phase 1 steps 1, 2, 4, 5, 6, 7, and 8 are also implemented: jdx/mise-action now pins the mise binary version with a stable cache prefix, docs CI uses bun ci, CI cargo tools install through mise's GitHub release backend instead of the source-compiling cargo backend, scheduled hygiene records GitHub Actions cache usage, shared mise.toml changes are routed by the specific tool entries they affect instead of always firing Rust or preview builds, and non-building policy/audit jobs no longer restore Rust target caches. Phase 2 now uses the middle-ground package shape from this roadmap: named heavy-crate nextest jobs plus one small-crate bucket instead of the old 19-way matrix; an archive fan-out experiment built successfully but failed an existing checkout-dependent test, so the current package matrix remains the safer and better-attributed shape for now. A same-run prepared-workspace artifact was measured and removed: run 27975505075 spent up to 4m09s downloading the prepared nextest workspace in one fan-out job, while the cache-only follow-up run 27976780904 completed the GitHub-hosted CI required gate in about 7m51s with no dependency download or compile markers. Run 27980183186 stayed in the same band at about 8m07s to ci-required: log scanning found no crates.io index update, crate download, or third-party dependency compile markers, but cargo nextest prepare still spent 2m40s compiling jackin' workspace crates, so the next win is reducing workspace compile/setup cost rather than restoring another large artifact. Hosted-runner GHA sccache was measured and rejected after later runs stayed at 0% hits with write errors; GitHub-hosted jobs now rely on rust-cache, the shared Cargo registry cache, and Buildx caches, while Velnor keeps local-disk sccache as an opt-in warm-runner accelerator. Compile-heavy Rust jobs now use semantic workspace-aware v2 rust-cache keys, shared by dependency/build shape rather than job name, and Cargo dependency jobs restore a separate shared Cargo registry/index/git DB cache with the same dependency-key shape; exact registry hits are verified with cargo fetch --locked --offline, while only true first-time misses may fetch from the network. The shared registry cache now also keys and fetches fuzz lockfiles, so fuzz lanes can run Cargo in offline mode after cache population. Measurement also removed cache work where it made the run slower instead of warmer: workflow-only actionlint no longer restores Cargo state, cargo fmt no longer restores registry or target caches because it performs no dependency resolution or compile, policy/audit lanes keep registry/advisory caches without restoring target archives that they cannot use, and nextest no longer serializes package and Docker lanes behind a cold target-cache seeding job. cargo audit now disables yanked-crate index checks in PR CI and uses --no-fetch --stale on advisory-cache hits, so hot audit lanes do not update the crates.io index or refetch RustSec data. Phase 4 steps 1, 2, 3, 4, and 5 are implemented: preview archive builds now start on push to main, publish waits for successful CI on the exact source SHA, preview owns the release-profile jackin archive build that packages jackin-role, source-path filtering is preserved, and the final SHA ancestry check still runs before mutating the rolling preview. Phase 5 steps 1, 2, and 6 are implemented: manual Buildx GHA cache refs carry ghtoken/repository, the old cache-mount experiment is superseded, construct builds now restore or build the pinned shellfirm binary outside Docker before copying it into the context, x64 construct jobs install shellfirm from the upstream GitHub release via mise instead of compiling it from crates.io, and the Dockerfile's from-source Rust compile is gone. Phase 6 step 3 is implemented: Codebook installs through the prebuilt mise GitHub backend, the old Rust/Cargo cache work is removed from docs spell-check jobs, and the Docs timing summary captures the warm result. Phase 7 steps 1-5 are implemented: preview and release archive jobs now share the same composite build/package/sign/upload action, release archive builds run beside the release test workflow, final archives use deterministic tar/gzip metadata settings, all release targets build on Linux through cargo-zigbuild, preview and release archive jobs use the same target-scoped cache-key design, and scheduled hygiene keeps a native macOS smoke job. Phase 0 steps 1, 2, and 4 are implemented: CI, Docs, Construct Image, Publish Homebrew Preview, and Release now write timing/cache summaries, short-retention Cargo timing artifacts exist for clippy, check-default, nextest prepare, preview builds, release builds, and jackin-dev builds, and the shared summary reports time to first red signal plus each workflow's target completion metric. The shared summary also now totals setup/cache/artifact/Docker/Cargo step time, counts and samples cache misses, dependency downloads, third-party dependency compiles, source-tool compiles, sccache issues, and prepared-workspace artifact restores so every run exposes the markers that decide whether another cache/routing iteration is required; its dependency marker scanner now tolerates ANSI-colored Cargo output. The timeout guardrail is implemented for CI, Docs, construct, preview, release, reusable nextest jobs, Renovate, scheduled hygiene jobs, and jackin-dev. Phase 8 lane-awareness has started: construct-e2e-image, Construct Image build and rehearsal jobs, preview archive builds, release test/archive builds, and jackin-dev archive builds now run through the same runner-lane matrix as the Rust gates; construct, preview, release, and jackin-dev artifact handoffs are lane-scoped so publish jobs consume GitHub-hosted artifacts while Velnor proves parity; native construct arm64 remains GitHub-hosted until Velnor has an arm path; local-disk sccache is enabled only on the opt-in Velnor lane. The optional Velnor persistent target/ store is now scoped by trust scope, repository, workflow, and job class upstream, and Velnor job containers now receive daemon-level CPU and memory caps, so the warm lane can be optimized without becoming the default trust boundary. Renovate no longer runs on every push to main; it stays scheduled and available through workflow_dispatch.

Latest measurement: Runs 27984408161 and 27985175824 rejected hosted-runner GHA sccache: both stayed at 0% hits with cache write errors in check-default and nextest prepare. Run 27986139438 then proved that simply dropping RUSTC_WRAPPER from GitHub-hosted jobs changes the default rust-cache environment hash and cold-starts the target cache (check-default rose to 1m42s and nextest prepare to 4m31s). Runs 27986624424 and 27987160427 proved that pinning rust-cache env-vars while leaving GitHub compile-heavy jobs unwrapped is still not fastest: caches hit and dependency downloads disappeared, but check-default stayed at 1m55/1m47 and nextest prepare stayed at 2m30/2m07. Run 27987545345 restored the wrapper while disabling the hosted GHA compiler-cache backend and moved the hot GitHub path back into the fast band: check-default 57s, clippy 1m01s, construct E2E image 1m55s, and nextest prepare 2m19s. Run 27987976766 kept that shape and stayed in the fast band after the archive-cache alignment work: ci-required finished in about 7m42s, check-default in 42s, clippy in 1m13s, nextest prepare in 2m15s, and Docker E2E in 3m14s. Run 27990579448 attempt 2 proved the same warm SHA path after merging main: no crates.io index update, crate download, third-party compile, cache-miss, or prepared-workspace artifact markers appeared; check-default took 42s, clippy 50s, nextest prepare 2m14s, and ci-required finished in about 7m24s. Its step timings exposed apparent nextest fan-out setup waste, but run 27991767807 rejected the attempted removal of the separate Cargo registry restore: package lanes and Docker E2E failed under --offline after restoring only the shared rust-cache archive. Run 27992734300 verifies the restored baseline: ci-required completed successfully, no dependency download, third-party compile, prepared-workspace, or sccache issue markers appeared, check-default took 47s, clippy 1m04s, construct E2E image 1m31s, nextest prepare 2m18s, Docker E2E 3m46s, and the longest real steps were Docker E2E test execution, nextest binary build, package tests, and construct image build. The explicit Cargo registry cache is therefore required for fan-out correctness until another measured design proves otherwise. This also confirms that the old prepared-workspace artifact handoff, not the explicit registry restore, was the real large regression: run 27969864046 spent 6m04s in nextest prepare, then package/Docker fan-out jobs spent up to 1m50s downloading plus 39s restoring the prepared workspace before tests could run. Run 27993146785 kept the same green baseline with no dependency download, third-party compile, or prepared-workspace markers. Run 27993742007 proved that policy/audit lanes remain green without target-cache restores, but also exposed ANSI-colored crates.io index update markers from cargo audit and cargo fuzz that the old scanner missed; the follow-up fixes audit to skip yanked index checks, makes the fuzz lane prove offline registry availability before running, and strengthens the marker scanner. Rerun 27994748751 then proved the hot path after fuzz-registry caching: the full log scan returned NO_MARKERS for crates.io index updates, crate downloads, plain third-party dependency compiles, prepared-workspace restores, source-tool compiles, and sccache issues; check-default took 47s, clippy 56s, audit 23s, fuzz 1m39s, nextest prepare 2m20s, package lanes 59s-1m46s, Docker E2E 2m51s, and ci-required finished about 7m15s after the run started. Run 27996372045 kept the hot path clean after extending timing summaries to the remaining workflows: full log scan again returned NO_MARKERS, check-default took 53s, clippy 57s, construct E2E image 1m41s, nextest prepare 2m27s, package lanes 1m08s-1m53s, Docker E2E 2m57s, and ci-required finished about 7m26s after the run started. The same run evaluated the baked jackin-ci image idea against current hot-run step costs: cache restore consumed 414s aggregate across 54 steps, Cargo/test work 389s across 15 steps, Docker work 172s across 3 steps, while tool setup was only 102s aggregate across 40 steps with a 5s per-job high-water mark in the nextest fan-out. Run 27996815419 then compared the corrected branch against latest main run 27972283053: main took about 14m31s and still logged crates.io index updates plus crate downloads in check, validator, nextest, and Docker lanes, while the branch took about 7m42s and the full marker scan returned NO_MARKERS. The remaining avoidable hot-path cost was cargo nextest prepare: it spent 95s building all nextest binaries even though the exact shared rust-cache key was already warm and the fan-out jobs restored the same cache before running their real tests. The nextest prepare job therefore now skips that seeding build on exact cache hits and only builds when the semantic cache is cold. Run 27997272508 proved the hot-cache skip: nextest prepare fell to 53s, the Build nextest binaries step was skipped, the full log scan again returned NO_MARKERS, Docker E2E took 2m51s, and ci-required finished about 5m39s after the run started. A lanes=both parity run, 27997911210, then exposed a cold-cache correctness bug instead of a Velnor-only issue: cargo clippy (GitHub) missed the shared Cargo registry cache, then the cache-population step inherited CARGO_NET_OFFLINE=true and could not fetch anyhow; the shared registry cache action now forces its population step online while leaving downstream build/test commands offline. Run 27998091712 proved that the registry fix works and prepare skips its build on a hot exact key (38s), but it also exposed cache-key over-sharing: Docker E2E restored the non-Docker all-features cache, then still compiled jackin-capsule for 7s and the docker-e2e test graph for 32s before running the real Docker tests. Run 27998435131 rejected the attempted Docker-specific key: the cold seed took prepare to 4m44s and the Docker job still restored a full exact key before compiling the same local workspace crates (jackin-capsule 6.94s, docker-e2e test graph 31.64s). This matches Swatinem/rust-cache behavior: it is useful for dependency artifacts, but workspace crate outputs are not a dependable cross-job cache target. CI therefore keeps the shared ci-all-features-dev-workspace-v2 key, because it gives the fastest proven hot path and avoids a second cold cache family that does not remove the Docker local relink/compile. A baked image might still help cold-starts, but it no longer addresses the measured hot-run wall-clock bottleneck; the only remaining candidate for eliminating Docker local compile without a large target restore is a measured cargo nextest archive --archive-file handoff plus a small jackin-capsule binary artifact, and it must beat the current 7s+32s local compile cost after upload/download time. The corrected GitHub-hosted baseline keeps RUSTC_WRAPPER=sccache for Cargo fingerprint and target-cache compatibility, disables the failing GHA compiler-cache backend with SCCACHE_GHA_ENABLED=off, and relies on rust-cache, the shared Cargo registry cache, and Buildx caches only where those caches warm real build work; preview, release, and jackin-dev archive jobs now use the same early-wrapper cache shape. Velnor keeps local-disk sccache as an opt-in accelerator.

Run 27999009032 rejected the remaining serialized nextest prepare gate. The run was green and had no dependency-download markers, but the shared target cache missed in cargo nextest prepare, so prepare compiled third-party and workspace crates for 5m05s, saved the cache, and only then let package lanes (52s-2m03s) and Docker E2E (2m54s) start; ci-required completed about 10m41s after the run began. The objective is fastest wall clock, not zero compilation at any cost, so CI now removes nextest prepare, lets package and Docker shards restore the shared ci-all-features-dev-workspace-v2 cache directly, keeps Cargo offline through the explicit registry cache, and lets only the jackin shard save the shared target cache because it pulls the widest package graph.

Run 27999729168 accepted that correction. CI was green, ci-required completed about 5m14s after run start, and the full log scan returned NO_MARKERS for crates.io index updates, crate downloads, third-party dependency compiles, source-tool compiles, prepared-workspace restores, and sccache issues. Package lanes ran directly after construct-image availability and finished in 52s-1m58s; Docker E2E finished in 2m51s. The Docker lane restored the exact shared target cache and did not download dependencies, but still rebuilt local jackin' workspace crates for jackin-capsule (6.65s) and the docker-e2e test graph (30.79s), which is now the remaining measured Docker-side compile cost. The same SHA also kept related workflows green: Docs reached docs-required in about 1m16s, Construct Image reached construct-required in about 2m18s, jackin-dev finished its archive builds in about 1m35s, and Renovate Validate finished in about 17s. The current hosted GitHub default therefore remains cache-only direct fan-out; further Docker compile removal needs a timed archive or binary-artifact experiment that beats roughly 37s of local rebuild plus any upload/download time.

Run 28000359236 rejected treating Velnor as a speed win before proving the runner itself was on the current tool/image baseline. The GitHub lane stayed fast, but the optional Velnor lane became the tail: native mise install jobs for cargo-audit and dependency policy stalled on untrusted /__w/mise.toml, the Sentry runner still used velnor/job-ubuntu:24.04 while Velnor source had moved to a 26.04 image with Rust 1.96.0, and the Velnor MSRV job correctly exposed that sysinfo 0.39.x no longer supports jackin's Rust 1.94 MSRV. The response was to fix Velnor first, not make it default: Velnor commit 080679d sets MISE_TRUSTED_CONFIG_PATHS=/__w in the native mise adapter, moves the default job image to velnor/job-ubuntu:26.04, and was deployed to Sentry as velnor-runner 0.1.29+trustscope.20260623.080679d with the 26.04 job image built locally. jackin' now pins diagnostics sysinfo to the 0.38 line, and cargo +1.94 check --workspace --all-targets --locked passes locally again. The next parity proof must rerun lanes=both after these fixes and compare step timing against the GitHub-required lane before counting Velnor as a performance improvement.

Runs 28001389977, 28001814662, and 28002123889 continued the dual-runner proof and found runner-parity bugs instead of reasons to make Velnor the default. Velnor commit 0eccb8c added cargo-backend tool bin discovery to the native mise adapter; commit c9155c8 then removed poisoned empty mise version dirs before install and exported direct install roots so GitHub-release tools like cargo-deny, cargo-shear, and cargo-audit are visible to later steps. Both were deployed to Sentry, ending at velnor-runner 0.1.29+trustscope.20260623.210000.c9155c8 with four velnor-jackin slots ready on velnor/job-ubuntu:26.04. Run 28002123889 proved the important Velnor cache/tool fixes in real CI: Velnor cargo dependency policy, cargo audit, cargo msrv check, cargo check default, cargo clippy, cargo bench build, actionlint, schema-check, construct image build, and several nextest package shards all passed quickly without the previous cargo-subcommand failures. The same run also exposed a workflow portability bug in Docker E2E: the job built jackin-capsule under Velnor absolute CARGO_TARGET_DIR, then hard-coded target/debug/jackin-capsule. The Docker E2E handoff now resolves ${CARGO_TARGET_DIR:-target}/debug/jackin-capsule, preserving GitHub's default relative target path and Velnor per-job warm target store. The next proof run must verify Docker E2E on both lanes before the branch can claim full dual-runner parity.

Run 28002415026 kept the GitHub-hosted lane green, including package shards and Docker E2E, but found the remaining Velnor issues were runner/environment parity rather than cache-design failures. Velnor ran the jackin' tests as root inside the job container, so chmod-000 auth fixture tests could still read files and could not produce EACCES; the minimized Ubuntu job image also let man exit 0 while printing the system-minimized notice instead of topic text, which means the help test should assert the documented contract of successful non-empty output rather than exact auth prose. Docker E2E timed out after the nested docker run reported Docker unavailable: Velnor mounted the host Docker socket, but test temp directories created under container /tmp were not visible at the same absolute path to the host Docker daemon. Velnor commit 28a7ef7 fixes that by exposing same-absolute host-visible temp/workspace mounts and VELNOR_DOCKER_HOST_TEMP; it was deployed to Sentry and published to the apt repository as velnor-runner 0.1.29+trustscope.20260623.220000.28a7ef7, with velnor/job-ubuntu:26.04 slots ready. The jackin' side now skips chmod-EACCES assertions when the current environment can still read the fixture, keeps the help test on the file-level zero-exit/non-empty-output contract, and sets TMPDIR=$VELNOR_DOCKER_HOST_TEMP for Docker E2E when Velnor exposes it. The next proof run must rerun lanes=both, verify Velnor Docker E2E passes, and re-scan all jobs for dependency download or compile markers before claiming Velnor parity or speed wins.

Run 28006419716 proved those test-environment fixes: Velnor cargo nextest jackin, jackin-runtime, jackin-tui, jackin-capsule, and small-crates all passed, so the chmod/root and minimized-manpage mismatches are resolved. Velnor compile/check jobs were also faster than GitHub on the same run (clippy 30s vs 51s, check-default 27s vs 48s, construct E2E image 47s vs 1m53s), but Docker E2E still failed before executing tests because actions/download-artifact reported no match for construct-trixie-image-Velnor even though the artifact was uploaded and later visible through the run artifacts API. The nextest reusable workflow now gives the token actions: read and downloads construct artifacts with explicit github-token, repository, and run-id, matching the documented cross-run/repository lookup path and avoiding runner-runtime artifact visibility differences. The next proof run must verify that Velnor can download its lane-scoped construct artifact and then reaches the actual nested-Docker E2E test.

Run 28006975146 proved full dual-runner parity after the explicit artifact lookup fix: ci-required and timing-summary both passed, Velnor Docker E2E downloaded the lane-scoped construct artifact and completed in 2m20s, and GitHub Docker E2E completed in 3m33s. Velnor also beat the GitHub-hosted lane on the compile/check/image jobs in this run (check-default 27s vs 42s, clippy 30s vs 49s, construct E2E image 47s vs 2m26s) while GitHub remained the default required lane. A combined scan of GitHub logs plus Velnor job-log artifacts found no crates.io index update markers, no crate download markers, no external-crate Compiling ... v... markers, and no prepared-workspace artifact restores. It did find Velnor rust-cache miss lines and showed sccache using the per-container home cache instead of the host-mounted /var/cache/sccache; that is a runner issue, not a workflow design win to ignore. Velnor commit d603e1e now defaults job containers to SCCACHE_DIR=/var/cache/sccache, keeps workflow env able to override it, moves Velnor-owned cargo-nextest and cargo-zigbuild installs to the GitHub release backend, and keeps cargo-deb on the cargo backend because the upstream GitHub release only publishes an amd64 .deb asset. It was deployed to Sentry and published to the apt repository as velnor-runner 0.1.29+trustscope.20260623.230000.d603e1e, with the public amd64 index and the installed jackin daemon both verified. The next proof run must rerun lanes=both on d603e1e and verify that Velnor sccache stats now report the host-mounted cache location before claiming local compiler-cache speedup.

Run 28022714137 exercised that lanes=both proof against Velnor d603e1e and found a cold-cache construct-image regression before Docker E2E could run. Both lanes missed the shellfirm prebuilt cache. GitHub then fell back to cargo install shellfirm --version 0.3.10 and failed because the current CI registry/offline shape does not guarantee that source-install path; Velnor found shellfirm through mise but staged the mise shim into the Docker context, so the Dockerfile copied a shim whose target did not exist in the image. The fix keeps the GitHub release backend as the default install path for x64 construct E2E by installing shellfirm through mise in that job, and changes the construct helper to stage mise which shellfirm's real binary while ignoring mise shim paths. The next proof run must rerun lanes=both, verify construct E2E image succeeds cold and warm on both lanes, verify Velnor sccache stats use /var/cache/sccache, and repeat the dependency/cache marker scan before claiming the local compiler-cache speedup.

Run 28023116265 proved the real-binary staging fix on Velnor (construct E2E image (Velnor) passed in 1m43s) but exposed two follow-up issues before Docker E2E could run. The workflow edit had installed shellfirm in cargo check default instead of construct E2E image, so GitHub construct still missed the prebuilt cache and fell back to the rejected cargo install shellfirm --version 0.3.10 source path. cargo clippy also failed on the new construct helper because one mise which probe needed the same xtask CLI justification used by the existing Docker and git probes, and the mise-shim detector used a redundant closure/collection. The follow-up fix moves shellfirm installation to the construct job only, keeps other Rust jobs on the smaller rust sccache toolset, and makes the helper clippy-clean. The next proof run must rerun lanes=both, verify GitHub construct uses the mise-provided shellfirm binary instead of any cargo-install fallback, verify Velnor remains green, and then continue the full marker scan.

Runs 28023491185 (lanes=both) and 28023487940 (default PR lane) proved that follow-up fix. The dual-lane run was fully green, including ci-required, timing-summary, both construct image jobs, and both Docker E2E smoke jobs. GitHub construct installed shellfirm through mise, staged the real 0.3.10 binary, copied it into the image, and did not execute any cargo install shellfirm fallback; the first GitHub construct run missed the prebuilt shellfirm cache and saved it, while Velnor restored the same key. Velnor sccache stats now report Local disk: "/var/cache/sccache", proving the runner-side cache mount fix is active. The combined scan of GitHub logs and Velnor job-log artifacts for 28023491185 found zero crates.io index updates, zero crate download markers, zero external-crate Compiling ... v... or Checking ... v... markers, zero prepared-workspace downloads, and zero shellfirm source installs. The remaining marker noise was rustup/toolchain misses on Velnor and GitHub rust-cache target misses on several compile/test jobs; those target misses did not cause dependency downloads or third-party crate recompile markers, but they still belong in the next performance pass because the goal is fastest wall clock, not merely clean dependency markers. The immediately adjacent default GitHub PR run 28023487940 was also green and its log scan found zero dependency/download/compile/cache-miss markers, showing the sequential hosted path is warm after the proof run.

Runs 28010020969 and 28010956214 reran lanes=both after merging latest main into this branch. Both were fully green, including ci-required and timing-summary. The first merged-head run proved parity on the new head: GitHub Docker E2E finished in 5m02s, Velnor Docker E2E in 4m40s, and the combined GitHub/Velnor log scan found zero crates.io index updates, zero crate downloads, zero external-crate Compiling ... v... lines, and zero prepared-workspace artifact restores. The sequential branch run 28010956214 confirmed the same marker result on the next run: GitHub logs and Velnor job-log artifacts again had zero crates.io index updates, crate downloads, external-crate compile markers, failed restores, or prepared-workspace downloads. The remaining GitHub-hosted slowness is therefore not dependency download churn; it is compile-heavy job cost on ephemeral hosted runners with SCCACHE_GHA_ENABLED=off. Velnor is faster because the deployed runner now gives job containers a persistent local SCCACHE_DIR=/var/cache/sccache; hosted GitHub runners intentionally do not have that local disk. The cache API also showed the new workspace caches scoped to refs/pull/632/merge, with no matching refs/heads/main or feature-branch cache for the new keys at the start of the proof. This matches GitHub's documented cache scoping: PR merge-ref caches are reusable by that PR, while branch runs can restore current-branch and default-branch caches. The next optimization must therefore be a measured hosted compiler-cache backend or artifact handoff experiment with sccache --show-stats and full marker scans; do not reintroduce the rejected GHA sccache backend or another serialized target-cache seed job without proving it beats the current green baseline.

Run 28013253814 proved a real GitHub-hosted fan-out race after the merged-head proof. The run was green, but the log scan found four crates.io index updates, four Downloading crates headers, and 1286 Downloaded ... lines, all on the GitHub lane; Velnor had zero cargo download or compile markers. The root cause was not a missing lockfile or changed dependency graph. Parallel GitHub fan-out jobs started before a shared Cargo registry cache existed for that key, so one job fetched the registry and crates while other jobs were already restoring the same absent key. Velnor hid the bug with persistent runner state. The fix is a per-lane cargo-registry-warmup job that runs immediately after routing, restores/populates the shared Cargo registry/index/git DB cache once, and gates every Rust/Cargo job before fan-out. GitHub remains the default required lane; Velnor remains the opt-in parity and speed lane.

Run 28013935155 accepted that fix on the next lanes=both proof. ci-required and timing-summary passed. A combined scan of GitHub job logs plus Velnor job-log artifacts found zero crates.io index update markers, zero crate download markers, zero external-crate Compiling ... v..., Checking ... v..., or Building ... v... markers, zero failed restores, and zero prepared-workspace downloads. The remaining cache-miss text was tool/cache housekeeping, mainly GitHub mise cache misses, a first-key cargo-audit cache miss, and Velnor rustup cache misses; none caused Cargo dependency fetch or external-crate compilation. The warmup cost was small (17s on GitHub, 15s on Velnor) and paid back immediately: GitHub clippy dropped from 2m31s to 49s, check-default from 2m25s to 43s, bench-build from 3m02s to 53s, construct E2E image from 3m03s to 1m34s, and nextest package lanes moved from 36s-2m58s to 41s-1m38s. Velnor stayed faster on most Rust/image jobs (clippy 27s vs GitHub 49s, check-default 25s vs 43s, construct E2E image 1m14s vs 1m34s, nextest package lanes 25s-56s vs 41s-1m38s), while GitHub was slightly faster on Docker E2E in this run (3m36s vs Velnor 3m48s), so Docker remains a per-run comparison point instead of an assumed Velnor win. Velnor sccache stats for check-default now show the host-mounted local disk cache at /var/cache/sccache, 86 compile requests, 17 executed compiles, 12 hits, 5 misses, and a 70.59% hit rate, improving from the prior 1.08% hit rate. The next iteration must keep this report shape for every dual-lane run, including failures, and keep watching whether cache restore time or GitHub's finite cache budget becomes more expensive than the work it avoids.

Run 28016655015 proved the stronger Phase 0 reporting contract after adding lane totals, third-party compile/check/build scanning, and GitHub cache-budget reporting to the shared summary script. The manual lanes=both run was green from 2026-06-23T09:33:23Z to 2026-06-23T09:41:38Z (8m15s wall clock), with ci-required and timing-summary both passing. Aggregate job runtime was 1927s across 18 GitHub-lane jobs, 868s across 18 Velnor jobs, and 32s across 4 shared jobs, so Velnor was about 2.2x faster by summed lane job time while GitHub stayed the default required lane. Long poles were Docker E2E on both lanes (4m40s GitHub, 4m12s Velnor). Key Rust/image comparisons stayed in favor of Velnor: check-default 2m04s vs 25s, clippy 2m21s vs 27s, bench-build 2m52s vs 27s, msrv-check 2m14s vs 29s, construct E2E image 2m26s vs 2m09s, and nextest package lanes 47s-3m07s vs 24s-59s. A combined scan of 22 GitHub API logs and 18 Velnor job-log artifacts found zero crates.io index updates, zero Downloading crates headers, zero Downloaded ... v... crate lines, zero external-crate Compiling ... v..., Checking ... v..., or Building ... v... lines, zero source-tool compiles, zero failed restores, and zero prepared-workspace artifact downloads. The remaining cache-miss text is now the next optimization target rather than a dependency-fetch bug: GitHub still reported mise cache misses, one cold Cargo registry warmup key, and No cache found lines in target-cache restores; Velnor still reported rustup cache misses and Rust cache miss for shared key ... lines even though persistent disk prevented dependency downloads or external-crate rebuilds. Velnor sccache for check-default still used /var/cache/sccache with 86 compile requests, 12 Rust hits, 5 Rust misses, no cache errors, and a 70.59% hit rate. The GitHub Actions cache usage API reported 107 active caches using 14.54GB against the 10GB budget reference, so the next cache iteration must reduce duplicate tool/target cache families or prove the repository has a higher effective quota; otherwise eviction pressure can recreate the parallel fan-out download race that the registry warmup fixed.

Run 28024637026 corrected the nextest target-cache assumption. The run was green and the Cargo registry remained offline, but GitHub-hosted package shards other than jackin still recompiled external crates: jackin-capsule rebuilt crates such as tracing, tokio, futures-util, and ratatui; jackin-runtime rebuilt crates such as serde, tokio, hyper, oci-client, and openidconnect; jackin-tui rebuilt crates such as libc, syn, rustix, serde, and ratatui; small-crates rebuilt crates such as serde_json, reqwest, bollard, criterion, and sigstore; and Docker E2E rebuilt third-party crates before building jackin-capsule. The comparison job, cargo nextest jackin (GitHub), was the desired shape: it restored a target cache that already had its external dependency artifacts and only rebuilt jackin' workspace crates. The root cause is over-sharing plus single-writer cache ownership: every package shard restored the same ci-all-features-dev-workspace-v2 target archive, but only the jackin shard saved it, and the jackin test graph is not a superset of the other package and Docker E2E graphs. The fix keeps one shared rust-cache namespace for fallback behavior, but adds shard-specific target-cache suffixes (package-<group> and docker-e2e) and lets only the GitHub-hosted lane save those archives. The verification rule is now explicit: on a sequential run with the same source, lockfiles, toolchain, features, and env fingerprint, GitHub and Velnor logs must show zero crates.io index updates, zero crate downloads, and zero external-crate Compiling ... v..., Checking ... v..., or Building ... v... markers; jackin' workspace crate rebuilds are acceptable when source fingerprints changed or Cargo needs local test binaries. Any exception must name the changed fingerprint, cache eviction, cache budget pressure, or another measured cause before the run can be considered optimized.

Problem

PR feedback, main-branch confidence, preview publishing, and tagged releases all need faster wall-clock results without dropping coverage. jackin' already has many good CI primitives -- path filters, PR cancellation, aggregator jobs, Swatinem/rust-cache, cargo-nextest, Docker Buildx layer caches, Bun download caching, pinned mise tools, and split workflows -- but the run triggered by commit 4c8b94bd05f84a62a04f9f235e2d846f14d04366 shows that broad Rust/runtime changes still produce a long post-merge feedback loop: roughly 12.5 minutes to green CI, then roughly 8.5 more minutes before preview artifacts publish.

The target state is not "run less and hope". The target state is staged signal: cheap deterministic checks fail first, expensive checks start as early as their real prerequisites allow, full coverage still runs before merge or before publishing, and every cache has measured hit/miss behavior rather than folklore.

"Almost instant" has two distinct ceilings, and they need different work. A cold GitHub-hosted runner always pays a floor of toolchain install, cache restore, and compilation of whatever the change touched, so the realistic best case for a real Rust change on hosted runners is a few minutes, not seconds. The cases that can become near-instant on hosted runners are the ones where change-aware routing skips the Rust surface entirely: docs-only, workflow-only, or construct-only changes should finish in well under a minute. Rust changes only approach instant on a persistent runner whose target/, cargo registry, and Docker layers stay warm between runs, because nothing the GitHub Actions cache can do is as fast as the build directory already being on disk. The program therefore runs on two tracks: tighten change-aware routing so non-Rust changes are near-instant on hosted runners, and stand up a warm persistent lane so incremental Rust changes recompile only the edited crate.

This roadmap is an iterative optimization program, not a one-shot workflow cleanup. Each implementation PR should pick one bottleneck or change class, capture the baseline, explain why the time is being spent, research candidate speedups, apply the smallest safe change, rerun the same scenario, compare the numbers, and either keep the change with evidence or adjust/revert it. Repeat until the remaining time is mostly irreducible work: the tests, builds, signing, publishing, and verification that are genuinely required for the files that changed.

The end state should be change-aware CI/CD. Docs-only changes should not build Rust binaries. Workflow-only changes should not run Docker E2E unless they affect that workflow. Construct-image changes should rebuild and verify construct paths without forcing unrelated docs deploy work. Rust changes should run the affected Rust test/build surface, plus the cross-cutting gates that can actually be invalidated by those changes. When the dependency graph is uncertain, run the broader safe set and use the measurement from that run to improve the classifier later.

Iteration Loop

Every speedup PR under this roadmap should follow this loop:

Measure every workflow/lane from every run before calling the optimization done. Record job duration, long steps, queue time if visible, cache hits/misses, cache keys, sccache hit rates, and the exact changed-file class that caused the work.
Explain the number: compilation, dependency download, tool install, Docker layer build, image push/pull, test execution, artifact upload, signing, or publish gate.
Research available speedups for that specific cost center: cache backend, cache key, test partitioning, nextest archive reuse, sccache, BuildKit cache mounts, prebuilt tool install, runner choice, or path-filter precision.
Apply one focused change or one tightly related group of changes.
Rerun equivalent scenarios: at minimum one PR-style run and one main/preview-style run when the change affects both.
Compare before/after wall time, first-failure time, required-check completion time, cache hit rate, and coverage surface.
Inspect logs aggressively for dependency download and rebuild markers: No cache found, Updating crates.io index, Downloading, large third-party Compiling blocks, BuildKit cache misses, source-compiled tools, and low sccache hit rates. If any marker appears after the first cache-populating run for the same dependency/tool inputs, explain why the cache was dirty or patch the workflow so the next sequential run reuses the cache.
Record the result in the PR and, if the decision changes the roadmap, update this item with the measured outcome.
Repeat until further speedups would require dropping coverage, weakening publish safety, or spending disproportionate engineering effort for negligible wall-clock gain. A run is not accepted as optimized while an avoidable dependency download, source tool compile, third-party dependency compile, or cache-key fragmentation remains.

Change-impact Routing Goal

The pipeline should eventually derive a compact run plan from changed paths and workflow inputs:

Change class	Required work
Docs/prose only	repo links, docs build/link check when published docs changed, Codebook docs/prose checks, deploy/live-link checks only on main docs deploy paths
GitHub workflow/tooling only	`actionlint`/`shellcheck`, affected workflow dry-run or targeted job, plus docs checks if docs changed
Rust crate-local change	fmt, schema when config/schema inputs changed, cargo check/clippy, affected package tests, dependency/audit policy when lock/tooling changed
Runtime/launch/capsule handoff change	Rust gates plus nextest Docker E2E lane because `dind_e2e` covers real Docker/runtime/capsule behavior
Construct image change	construct image build/publish path plus Docker E2E using the rebuilt construct artifact
Preview/release build logic change	preview/release archive build rehearsal, signing/SBOM/attestation checks, publish mutation only after CI gates
Unknown or cross-cutting change	broader safe set, then refine classifiers once the run shows which work was actually needed

Evidence From `4c8b94bd`

The commit merged a large instant-launch change set: workflow rewrites, Rust runtime/image/launch code, docs, and docker/construct inputs. That breadth intentionally fired every mainline path filter. All runs below were successful.

Workflow	Run	Trigger	Wall time	Long pole
`CI`	`27937532691`	`push` to `main`	12m 34s	`cargo nextest prepare` 4m 33s, `cargo build validator` 6m 00s, then Docker E2E waited for the full package matrix and ran 3m 07s
`Docs`	`27937532567`	`push` to `main`	5m 06s	cold `codebook-lsp` installs in `spell-check-docs` and `spell-check-source` at about 2.5m each, docs link/build path 2m 26s, deploy live-link verification 1m 21s
`Construct Image`	`27937532647`	`push` to `main`	4m 12s	arm64 image publish 2m 59s, amd64 image publish 1m 53s, manifest publish 39s
`Publish Homebrew Preview`	`27938150807`	`workflow_run` after `CI`	8m 31s	four release-profile `cargo zigbuild` jobs at 6m 47s to 7m 26s, then publish 47s
`Renovate`	`27937532547`	`push` to `main`	3m 40s	self-hosted Renovate 3m 30s; not part of branch protection but consumes Actions capacity after every main push
`Renovate Validate`	`27937532557`	`push` to `main`	11s	no meaningful speed issue

Within CI, the fastest checks already return early: changes 5s, actionlint 13s, fmt 24s, schema-check 33s. The slow path is structural. The old docker-e2e path depended on the reusable test workflow as a whole, so it started only after every package test job completed, even though it could run in parallel with most package tests once the required construct image was available. This branch moved Docker E2E into the reusable nextest workflow and then removed the serialized nextest prepare gate after measurement showed cold cache misses made it the new critical path.

The inherited CI matrix split item also called out the real Docker boundary directly: crates/jackin/tests/dind_e2e.rs is now 1323 LOC and exercises real docker run, PTY, runtime launch, and jackin-capsule handoff behavior. That should stay a named docker-e2e failure surface instead of being buried inside a general package-test lane. It should still belong to the nextest test system, though: the current command is already cargo nextest run -p jackin --features e2e --profile docker-e2e, so the better architecture is a dedicated nextest Docker lane inside the reusable nextest workflow, not an unrelated top-level CI job.

Current Speedups Already Present

Path filters in .github/workflows/ci.yml, .github/workflows/docs.yml, .github/workflows/construct.yml, and .github/workflows/preview.yml prevent unrelated workflows from doing full work.
PR workflow concurrency cancels stale PR runs while preserving non-cancelled release serialization.
Rust jobs restore ~/.rustup, mise-managed tool caches, and Swatinem/rust-cache target/cargo caches; the reusable nextest workflow centralizes the expensive test-binary build and shares it with package jobs.
Construct builds use Buildx with registry cache for published main builds and GitHub Actions cache scopes for PR/rehearsal builds.
Construct builds stage the pinned shellfirm binary before Docker Buildx runs, so the construct Dockerfile no longer carries a Rust toolchain or from-source shellfirm compile stage.
Docs jobs cache Bun's download cache and lychee's link cache, and they separate repo-link checks from full site build/link checks.
CARGO_INCREMENTAL=0 is already set on the main compile-heavy CI/preview/release paths, which is compatible with compiler-output caching via sccache.

Findings

Two co-critical lanes set the floor, not one

The analyzed CI run was gated by two independent lanes that each took roughly twelve minutes, so speeding up only one of them would have left the workflow's wall clock unchanged. The old lint lane ran check-all-features and then clippy and check-default, which were serialized because both of the latter declared needs: check-all-features. The old test lane ran nextest prepare, then the per-package matrix, then docker-e2e, which waited for the entire reusable test workflow.

changes
 ├─ check-all-features ──┬─ clippy              (old lint lane, ~12m)
 │                       └─ check-default
 └─ test: prepare ─ packages(19) ─ docker-e2e    (old test lane, ~12m)

This branch cuts both lanes: the lint lane no longer runs the redundant check-all-features job, Docker E2E starts after prepare, and the package matrix is bucketed into named heavy crates plus one small-crate bucket. The measured GitHub-hosted PR run 27961783994 completed ci-required in about 10m16s, down from about 12m33s on the analyzed main run 27937532691. A later artifact-handoff experiment barely improved full workflow wall time and made individual fan-out jobs worse: run 27975505075 still took about 13m53s overall and spent up to 4m09s downloading the prepared nextest workspace. Removing that handoff and relying on shared cache restore produced run 27976780904, which completed ci-required in about 7m51s. A cold-ish follow-up on the rewritten branch, run 27981536710, exposed the remaining cache truth: the new Cargo registry key was cold, many parallel jobs fetched the same crates and then raced to save the same cache, and cargo nextest prepare recorded sccache 0% hits with 934 misses. The immediate sequential run 27982280463 proved the registry/target caches do help once warm -- cargo check default fell from 2m44s to 42s, clippy from 2m57s to 57s, MSRV from 2m28s to 42s, and nextest prepare from 5m04s to 2m16s -- but the hot log still showed sccache 0% hits in check-default and nextest prepare because the compiler-cache namespaces were split by job. Runs 27982918582 and 27983590600 proved that simply sharing the namespace is still not enough: the same hosted-runner commit stayed in the same band or got worse (nextest prepare 2m16s then 2m36s), sccache remained 0% hits, and the logs showed cache write errors on every cacheable Rust compile. Run 27984408161 proved the first run-unique write-key fix was still too broad: ci-required stayed green, dependency downloads were still gone, but nextest prepare worsened to 2m52s and check-default/nextest prepare still showed 0% hits plus write errors because multiple jobs wrote the same run key. Run 27985175824 proved per-job write keys still did not make hosted-runner GHA sccache useful: dependency downloads stayed gone, but sccache remained at 0% hits with write errors in check-default and nextest prepare. The useful win in that sequence was Docker layer reuse, where construct E2E image build fell from 3m59s to 1m34s on the hot shared-cache run and stayed near 2m00s after the key change. Runs 27986624424 and 27987160427 then proved that removing the wrapper entirely was slower even with cache hits; run 27987545345 restored the wrapper with SCCACHE_GHA_ENABLED=off and returned check-default to 57s and nextest prepare to 2m19s. The branch now rejects the hosted-runner GHA compiler-cache backend but keeps wrapper-compatible Cargo target fingerprints on GitHub-hosted jobs; Velnor keeps local-disk sccache because that backend can stack with warm target/. The lint lane itself is no longer the long pole: in the old run, cargo check all features started at 07:47:20Z and the dependent clippy/check-default pair finished at 07:51:16Z, roughly 3m56s later; in run 27961783994, clippy and check-default both started at 14:53:44Z and the slower one finished at 14:54:35Z, roughly 51s later. The remaining measured critical path is now construct E2E image build, nextest prepare, and Docker E2E.

In the old .github/workflows/ci.yml shape, clippy and check-default both declared needs: check-all-features, but every one of the three jobs set its Swatinem/rust-cache shared-key to its own github.job, so they never shared a target/ directory. The dependency bought nothing on the green path: clippy waited for check-all-features to finish and then recompiled the workspace from its own cold cache, adding a full compile to the lint lane for no reuse. Because cargo clippy --workspace --all-targets --all-features performs the full type and borrow check before linting, clippy is a strict superset of check-all-features -- anything that compiles under clippy compiles under check, and check cannot catch a compile error clippy would miss. This branch removes the redundant check-all-features job entirely, leaving clippy as the all-features compile gate and check-default as the default-feature compile gate.

Critical-path sequencing beats more matrix fan-out

The prior "split the monolithic check job" work is partly implemented now: fmt, schema, check, clippy, dependency policy, audit, nextest package tests, Docker E2E, fuzz, bench build, and MSRV are separate jobs. The remaining opportunity is dependency shape. docker-e2e used to wait for all package tests because the caller saw the reusable test workflow as one dependency. This branch moved Docker E2E into .github/workflows/rust-nextest.yml as a first-class nextest lane, and then removed the prepare dependency when run 27999009032 proved the serial cache-seeding job was slower than direct parallel shard restore on cache misses.

The remaining matrix work should preserve the original attribution goal as well as speed: lint/check, deterministic package tests, integration-heavy runtime tests, Docker E2E, capsule, MSRV, validator, dependency policy, and audit should stay distinguishable in GitHub checks even if the implementation moves to nextest archives or partitions. A local archive experiment on this branch built a jackin-capsule nextest archive successfully, but running from that archive failed an existing test that expects checkout-local context. A same-run prepared-target artifact experiment was also rejected after measurement: artifact download and extraction were slower than restoring the semantic cache and letting the small amount of remaining local work run. Archive/partition work should resume only after checkout-local tests are archive-safe and after an equivalent run proves lower wall clock than the cache-only matrix.

There was a second, larger opportunity in the same lane: the 19-package matrix was mostly fixed overhead. The branch first used prepare to build every test binary with cargo nextest run --workspace --no-run --all-features, but later measurement showed the same job becomes a wall-clock regression when its target cache misses. This branch adopts the middle ground from Phase 2: named jobs for the heavy crates plus one bucket job for the small crates, all restoring the same shared cache directly. Docker E2E rebuilds jackin-capsule from the warm restored cache before setting JACKIN_CAPSULE_BIN; this cost about 6s in run 27976780904, far cheaper than the rejected multi-minute prepared-workspace handoff.

Main-to-preview is serialized too late

Preview builds used to wait until the whole CI workflow completed because .github/workflows/preview.yml triggered from workflow_run. On the analyzed commit, that meant about 12.5 minutes of idle time before a 7.5 minute build matrix began. This branch changes that shape to "build early, publish after CI": preview artifact builds start on the push event in parallel with CI, while the release mutation and Homebrew tap update wait for the matching CI conclusion and source SHA.

The stronger structural option is to remove the cross-workflow handoff entirely by building the preview archives inside the CI DAG and gating a final publish job on the full needs set plus push-to-main. That eliminates the inter-workflow event latency, shares one checkout and one warm cache, and produces a single required check. This branch removes the duplicate CI build-validator job, so preview now owns the release-profile jackin archive build that packages jackin-role; the remaining consolidation question is whether that preview build should move into the CI DAG.

Preview builds all targets from Linux with cargo-zigbuild plus a cached macOS SDK. The release workflow used to keep macOS runners for macOS targets and a separate install sequence, which made cache behavior harder to reason about and made preview a weaker rehearsal. This branch moves release archive builds onto the same Linux cargo-zigbuild shape and shared target-scoped archive cache keys; the remaining durable cleanup is one composite/reusable "build signed archive" path used by preview and release, with release adding only tag/version and publishing gates.

The concrete direction was to adopt preview's cargo-zigbuild-from-Linux path for every target, including macOS, and to drop the macos-latest runners from the release critical path; preview already proves the cross-compile works with a cached SDK, and macOS runners carry roughly a tenfold cost multiplier and slower start-up. This branch implements that path and leaves a single native-macOS build-and-test in the scheduled hygiene lane for parity, so a macOS-specific regression is still caught off the critical path. The preview and release archive jobs also now share Swatinem/rust-cache keys by archive target instead of splitting warm caches by workflow job name (Phase 7).

Tool installation cache misses matter on cold revisions

The inspected logs show warm rustup caches, but cold mise.toml changes caused cargo-installed tools to build from source: cargo-audit, cargo-deny, cargo-shear, and codebook-lsp each paid cold-install cost in at least one job. The mise-action internal cache missed too because workflows pin the action SHA but did not pin the mise binary version input; some manual caches only covered ~/.local/share/mise/installs/cargo-*, so non-cargo tools such as zig, cosign, and syft relied on the action's internal cache. This was correct functionally, but noisy for a speed-critical pipeline.

Two concrete reductions follow. First, every cargo: entry in mise.toml is compiled from source by mise's cargo backend; the CI tool entries now use mise's GitHub release backend where upstream publishes reliable binaries: cargo-nextest, cargo-deny, cargo-audit, cargo-hack, cargo-zigbuild, cargo-shear, cargo-fuzz, and codebook-lsp. Second, the deeper dedup is a baked CI image: roughly fifteen CI jobs each repeat the mise install and rustup restore, so building a jackin-ci image (Debian plus the pinned toolchain and tools from the same mise.toml, reusing the construct-image machinery) and running Rust jobs in container: removes per-job tool setup wholesale (Phase 1).

Docker layer caching is good, and cache-mount risk is lower now

Docker Buildx registry and GHA caches are already used. Docker's docs note that cache-to mode=max exports more layers than mode=min, which matches the current construct cache choice. Docker's GitHub Actions cache docs also note that BuildKit cache mounts are not preserved in the GHA cache by default. That mattered when docker/construct/Dockerfile compiled shellfirm from source with cargo registry/git and /sccache-build cache mounts; this branch removes that stage by staging shellfirm before Buildx, so the remaining Docker speed work is regular layer-cache behavior, not cache-mount preservation for a Rust compile.

`sccache` should be adopted, with stats proving each lane

Mozilla sccache caches compiler outputs through RUSTC_WRAPPER and supports local plus remote/GHA-backed storage. Its Rust guidance requires incremental compilation to be disabled for cacheability, which already matches the project's compile-heavy jobs. This branch measured hosted-runner GHA sccache and rejected it after runs 27984408161 and 27985175824 showed 0% hits plus write errors in check-default and nextest prepare. Follow-up runs 27986624424 and 27987160427 showed the opposite trap: removing RUSTC_WRAPPER=sccache from GitHub-hosted compile jobs removed backend errors but made Cargo target reuse slower even when every registry/tool/target cache restored successfully. GitHub-hosted jobs therefore keep the wrapper for target-cache/fingerprint compatibility while forcing SCCACHE_GHA_ENABLED=off; Velnor keeps local sccache because persistent local disk is the backend shape that can actually win.

The GitHub Actions cache backend for sccache is not part of the hosted baseline anymore: narrow pilots showed 0% hits and write errors, and the broader fan-out risk remains the same throttling class as Docker GHA cache. Prefer a backend that does not rate-limit -- local disk on a persistent runner, or S3/Redis -- which is also why sccache pairs best with the warm-runner lane, where an on-disk target/ already outperforms a remote compiler cache.

Incremental compilation is a targeted experiment, not a universal switch

Cargo's profile docs say incremental compilation stores reusable state in target, only applies to workspace/path dependencies, and can be overridden with CARGO_INCREMENTAL; dev/test defaults enable it, release defaults disable it. In this repo the CI jobs force it off to keep caches deterministic and sccache-compatible. Re-enabling it may help same-branch PR reruns if target caches are retained, but it increases cache size and conflicts with sccache. Treat it as an experiment for a narrow nextest lane, not for final release artifacts.

Persistent warm runners are the only instant path for Rust changes

A cold GitHub-hosted runner starts with an empty target/ and recompiles the changed crate's dependency closure every run; the GitHub Actions cache restores a snapshot but still pays the download and extract of a multi-gigabyte archive plus any post-restore recompilation. A persistent runner that keeps target/, ~/.cargo, and Docker layers warm between runs compiles only the crate that actually changed -- the difference between minutes and seconds, and a gap no hosted-runner cache strategy can close. The existing velnor lane (selectable through the lanes workflow_dispatch input) is the seed of this, but it must remain an explicit opt-in accelerator. GitHub-hosted runners stay the default and required path for jackin' because they provide the stable trust boundary and cold-run parity; Phase 8 is therefore about making Velnor complete enough for optional lanes: both verification, not about making it the default.

Dual-runner parity is a hard constraint; velnor speedups ride on top

Every lane must be runnable on both GitHub-hosted runners and the self-hosted velnor lane (tailrocks/velnor) when a maintainer explicitly selects lanes: both, and a GitHub-hosted run must stay the default and required parity gate -- a green warm-lane run has to imply a green cold-lane run, the same PR/main parity rule the repo already enforces. Velnor may carry heavy optimizations that hosted runners cannot (a warm target/, warm ~/.cargo, warm Docker layers, a local-disk sccache backend), but only as an opt-in accelerator on top of a baseline that still passes on hosted runners. The rule for every speedup is therefore: when a capability is missing, improve it in velnor itself so the capability exists on that runner, then verify the change still runs on both lanes before it lands. Never fork the pipeline into velnor-only behavior that hosted runners cannot reproduce, and never drop a job to hosted-only just to avoid teaching velnor the capability -- both leave the two lanes out of parity.

The scaffolding already exists: matrix-setup emits a configs array and most compile and test jobs fan out with runs-on: ${{ fromJSON(matrix.config.runner) }}, so lanes: both already runs them on both. construct-e2e-image, preview archive builds, and release test/archive builds now use that same runner-lane matrix. Construct image artifacts are lane-scoped so Docker E2E consumes the image built by the same lane; preview and release archive artifacts are also lane-scoped so manual lanes: both rehearsals can require both lanes while mutation jobs download only the GitHub-hosted artifacts. The current gaps to close before Velnor is useful as an optional parity lane are concrete: docker-e2e and the construct build assume a working Docker daemon, so Velnor must provide one; the construct workflow builds arm64 natively on ubuntu-24.04-arm, so Velnor needs an arm path or that leg stays hosted-only; and the persistent state on Velnor -- the very warm directories that make it fast -- must be protected from cross-run poisoning. Each gap is a fix to make Velnor compatible, not a reason to make it default.

Non-critical workflows compete for the runner pool on every push

Every push to main fires CI, Construct Image, Docs, and Renovate concurrently, then Publish Homebrew Preview after CI. Renovate took 3m30s on the analyzed run and is not a branch-protection check, yet it consumes Actions concurrency on every push; when the account's concurrent-runner pool is saturated, critical-path jobs queue behind work that does not gate anything. Moving Renovate to a cron schedule instead of a push trigger frees that capacity for the jobs that actually gate merge and publish. Queue time does not show up in per-job durations but is real in wall clock.

Path filters over-trigger on shared tool config

The old rust path filter in .github/workflows/ci.yml included every mise.toml change, so bumping a docs-only tool version fired the entire Rust CI surface. This branch removed the stale per-tool cargo-install cache keys and now routes mise.toml through small classifiers: Rust CI turns on only when Rust-relevant mise entries (zig or the cargo tooling aliases) change, and preview turns on only when release-build tooling entries (zig, cargo-zigbuild, cosign, or syft) change.

Implementation Phases

The phases below are grouped by change class and risk; the numbers are identifiers, not a priority order. For wall-clock impact, work the highest-leverage cuts first regardless of phase number. The two co-critical ~12-minute lanes (lint and test) and the cross-workflow preview idle dominate the analyzed run, so they lead the recommended order:

Rank	Work	Phase	Estimated wall-clock effect	Risk
1	Done in this branch: delete the redundant `check-all-features` job and let `clippy` own the all-features compile gate	Phase 9	removes ~one full workspace compile (~6m) from the lint lane	low -- clippy is a strict superset of check
2	Collapse or bucket the 19-package nextest matrix	Phase 2	removes ~14-18 redundant cache restores and shrinks the tail that gates `docker-e2e`	medium -- benchmark attribution
3	Overlap the preview build with CI, or fold it into the CI DAG	Phase 4	removes ~12.5m of cross-workflow idle before publish	medium -- publish gating must stay exact
4	Warm persistent-runner fast lane for Rust changes	Phase 8	minutes to sub-minute for incremental Rust changes	high -- fork/secret isolation
5	Prebuilt cargo tools and a baked CI image	Phase 1	removes cold tool compiles and per-job setup	low
6	zigbuild every target, evict macOS runners from release	Phase 7	drops slow, costly macOS runners off the release path	medium
7	Prebuild `shellfirm` as a per-version artifact	Phase 5	removes a from-source compile from every construct build	low-medium

Measurement (Phase 0) still comes first in practice: capture a lane's baseline before and after each change so the estimates above are replaced with evidence.

Phase 0 -- Measurement and cache truth

Done in this branch: add workflow timing/cache summaries to CI, Docs, Construct Image, Publish Homebrew Preview, Release, jackin-dev, Hygiene, Renovate, and Renovate Validate. The shared summary script writes workflow wall time, target-gate time, longest jobs, longest steps, lane aggregate job time, step category totals, GitHub Actions cache budget usage, cache hit/miss/restore markers from job logs, dependency-download markers, third-party compile/check/build markers, source-tool compile markers, sccache issue markers, prepared-workspace artifact markers, and links to the GitHub job records into the GitHub step summary.
Done in this branch: add short-retention cargo --timings artifacts for clippy, check-default, nextest prepare, preview builds, and release builds.
When testing sccache, always emit sccache --show-stats and fail the experiment only on compile failure, not low hit rate.
Done in this branch: track the target metric explicitly. The shared timing summary reports time to first red signal and maps each workflow to its target completion point: CI required gate, Docs required gate, Construct required gate, preview publish, GitHub release publish, and final release pipeline completion.
For every CI/CD proof run, scan both normal GitHub logs and Velnor job-log artifacts for crates.io index updates, crate downloads, external-crate compile/check/build markers, source-tool compiles, cache misses, failed restores, and prepared-workspace downloads before claiming the run is optimized. The report is mandatory whether the run succeeds or fails, and it must compare GitHub and Velnor per runner, per job, and per important step: wall time from workflow start, job runtime, cache restore/save time, dependency download markers, dependency compile markers, source-tool compile markers, sccache stats, artifact upload/download time, Docker time, and the long pole. Any marker must be acted on in the same iteration when a cache, routing, or runner change can remove it; otherwise record why it is an unavoidable first-run, changed-input, or cache-budget miss. The target is fastest wall clock, so a cache change that removes compilation but adds more restore/download time is a failed optimization until measurements prove otherwise. Treat GitHub's finite Actions cache budget as part of the design: prefer shared dependency keys that can restore from main into PR branches, avoid multiple caches containing the same registry/target data, and record cache usage/eviction pressure whenever a run shows unexpected downloads or cold restores. Keep a running ledger of techniques tried, accepted, and rejected so future iterations prove that Velnor is actually faster where claimed and that the GitHub-hosted default has been pushed to the best practical performance.

Phase 1 -- No-risk cache hygiene

Pin the version: input for jdx/mise-action so the action's internal mise cache is reproducible instead of fetching the latest mise release on every cold cache.
Either rely fully on jdx/mise-action's cache with a stable cache_key, or extend the manual cache paths beyond cargo-* so zig, cosign, syft, bun, node, and lychee are warmed consistently.
Audit every cargo-installed tool for a maintained prebuilt install path that still flows through mise. If a tool has a reliable prebuilt binary, prefer that over cargo install; otherwise keep the cargo-tool cache and document cold-install cost.
Switch docs jobs from bun install --frozen-lockfile to bun ci for the same locked install semantics with clearer CI intent; keep the existing Bun download-cache path unless timing shows node_modules caching beats reinstalling from Bun's cache.
Done in this branch: keep GitHub cache keys intentionally broad enough to restore from main, matching GitHub's documented branch/default-branch search order, and add scheduled cache-size review because GitHub cache storage can become read-only when repository cache budgets are exhausted.
Done in this branch: move the CI cargo tools that ship reliable release binaries off mise's source-compiling cargo backend and onto mise's GitHub release backend: cargo-nextest, cargo-deny, cargo-audit, cargo-hack, cargo-zigbuild, cargo-shear, cargo-fuzz, and codebook-lsp.
Done in this branch: narrow shared mise.toml routing so docs-only tool bumps do not fire the Rust surface, release-tool bumps do fire preview, and jdx/mise-action's stable cache key replaces per-tool source-install caches.
Done in this branch: remove rust-cache target restores from the policy/audit lanes because cargo shear, cargo deny, and cargo audit inspect metadata/indexes and do not reuse workspace build artifacts. Keep the shared Cargo registry cache for those lanes so lockfile/index data still restores from the same dependency key, keep only ~/.cargo/advisory-db in the audit-specific cache to avoid restoring the registry twice, run cargo audit with --no-yanked plus --no-fetch --stale on advisory-cache hits so hot PR runs avoid both crates.io index updates and RustSec refetches, and make the shared registry cache own fuzz lockfiles so the fuzz lane can run with CARGO_NET_OFFLINE=true. The registry cache population step explicitly disables Cargo offline mode, because cold/missing cache population is the one accepted path that must download dependencies before later jobs can prove offline operation.
Evaluated in this branch and deferred for the hot GitHub-hosted path: run 27996372045 showed tool setup at 102s aggregate across 40 steps, with the largest per-job setup step at 5s, while cache restore, Cargo/test work, and Docker work were much larger. A baked jackin-ci image could still be revisited for cold-start or self-hosted fleet ergonomics, but the current hot-path bottleneck is cache restore plus real build/test/Docker execution, not mise/rustup setup. Do not add a containerized CI path until a focused benchmark proves it beats the current GitHub-hosted baseline without weakening lane parity.

Phase 2 -- Make Docker E2E a nextest-owned lane

Done in this branch: move the top-level docker-e2e job from .github/workflows/ci.yml into .github/workflows/rust-nextest.yml as a dedicated docker-e2e job that runs beside the package shards instead of waiting for every package job.
Done in this branch: keep Docker E2E out of the generic package matrix. It is a named nextest-owned Docker E2E smoke job, uses the docker-e2e profile, keeps Docker daemon access, downloads the construct-image artifact when needed, rebuilds jackin-capsule from the warm cache, and points JACKIN_CAPSULE_BIN at that binary.
Done in this branch: pass construct-image state into the reusable nextest workflow so the Docker lane downloads and loads construct-trixie-image only when the construct image changed; otherwise it uses the published image path the tests already expect.
Rejected after measurement: splitting prepare into the real first dependency for both package tests and Docker E2E made cache ownership simple, but run 27999009032 showed a missed target cache turns it into a 5m05s serial gate. Deterministic package tests and real Docker E2E now start directly from the same semantic workspace cache so target-cache misses compile in parallel instead of blocking fan-out.
Evaluated in this implementation branch: cargo nextest archive plus archive-mode package filtering is not safe to replace the package matrix yet. cargo nextest archive --timings --archive-file target/nextest-archives-local/jackin-capsule.tar.zst -p jackin-capsule --all-features --color=always --locked succeeded locally, proving the build/archive path. Running it with cargo nextest run --archive-file target/nextest-archives-local/jackin-capsule.tar.zst -E 'package(=jackin-capsule)' --no-tests=pass --color=always failed jackin-capsule::daemon::tests::command_stdout_trimmed_returns_trimmed_stdout, because archive extraction does not provide the same checkout-local context as the normal package job.
Done in this branch: preserve the current middle-ground package matrix instead of replacing it with archive/partition fan-out. The current shape already keeps heavy-crate attribution, avoids most of the old 19-way cache restore overhead, and preserves the checkout/workspace assumptions used by the existing test suite. Revisit archive/partition only after tests that depend on repository-local context are made archive-safe.
Done in this branch for the adopted middle-ground matrix: keep jackin-capsule visible as its own package lane while grouping low-cost crates into small-crates. If the package matrix is later replaced by partitions, keep a dedicated capsule partition or named report entry so capsule regressions remain obvious.
Done in this branch: carry the old path-filter intent forward for Docker E2E. The docker_e2e route now triggers the real Docker lane for docker/**, Docker runtime assets, launch/runtime code, capsule handoff code, and crates/jackin/tests/dind_e2e.rs, while unrelated Rust package changes still receive deterministic tests without forcing Docker E2E.
Done in this branch: collapse the 19-package matrix to the middle-ground shape from this roadmap -- named jobs for jackin, jackin-capsule, jackin-tui, and jackin-runtime, plus one small-crates bucket. This keeps per-crate attribution for the heavy crates while removing most redundant target-cache restores.
Reverted after measurement in this implementation branch: uploading and downloading the prepared workspace made the fan-out slower. Docker E2E now rebuilds jackin-capsule from the restored warm cache, marks it executable, and exports JACKIN_CAPSULE_BIN; run 27976780904 measured that rebuild at about 6s while the rejected handoff path spent up to 4m09s downloading the prepared workspace.
Rejected after measurement: removing the explicit Cargo registry cache restore from nextest package and Docker E2E fan-out jobs made run 27991767807 fail under --offline immediately after the shared rust-cache restore. The restore costs a few seconds per fan-out job on warm runs, but it is currently the correctness guard that keeps package tests and Docker E2E from downloading dependencies. Any future attempt to remove it needs an equivalent offline proof in every fan-out lane before landing.
Rejected after measurement: making nextest prepare a cold-cache seeding job instead of a mandatory hot-path workspace rebuild helped hot exact-cache runs, but it still serialized fan-out on cache misses. Run 27997272508 confirmed the hot-cache skip (prepare 53s, build step skipped), run 27998091712 found Docker E2E local workspace compiles after a generic cache hit, run 27998435131 rejected a Docker-specific key, and run 27999009032 rejected the remaining serialized prepare gate because a missed shared key made prepare spend 5m05s compiling before package and Docker lanes could start. The branch keeps the shared ci-all-features-dev-workspace-v2 key, removes the prepare job, lets shards restore that cache directly, and lets only the jackin shard save it. Any future attempt to remove the Docker local compile should be a timed nextest-archive experiment, not another target-cache key split or serial seed job.

Phase 3 -- Host Rust compiler cache adoption

Done in this branch: add pinned sccache installation through mise's GitHub release backend, avoiding cargo install sccache in CI because compiling the cache tool on cold runners defeats the purpose.
Rejected after measurement: the GitHub Actions cache backend for sccache produced 0% hits and write errors in repeated hosted-runner runs, so hosted jobs no longer enable it. Keep CARGO_INCREMENTAL=0 in compile-heavy lanes for deterministic target caches and for Velnor local sccache.
Done in this branch: keep local sccache available only on the opt-in Velnor lane, where the backend is persistent disk instead of the hosted GHA cache service.
Done in this branch for Velnor: emit sccache --show-stats to the step summary and upload the raw stats as a short-retention artifact. Treat low hit rate as data, not a failure; fail only on build correctness.
Corrected after measurement: keep RUSTC_WRAPPER=sccache in GitHub-hosted compile-heavy CI and nextest jobs for target-cache compatibility, but disable the hosted-runner GHA compiler-cache backend with SCCACHE_GHA_ENABLED=off. The wrapper is part of the Cargo fingerprint shape; the rejected piece is the hosted GHA backend, not the wrapper itself.
Rejected after measurement: broad read namespaces, job-specific run-unique write keys, and SCCACHE_BASEDIRS still produced 0% hits and write errors on hosted GitHub runs. The durable rule is now narrower: no hosted-runner GHA sccache backend; keep wrapper-compatible target caches on GitHub and use local-disk sccache as the opt-in Velnor accelerator.
Treat hosted-runner GHA sccache as rejected unless a future upstream/backend change is proven faster than rust-cache plus registry caching. Pilot any future compiler-cache work on Velnor local disk, S3, or Redis first.

Phase 4 -- Main push and preview pipeline overlap

Done in this branch: start preview archive builds on push to main in parallel with CI, keyed by source SHA.
Done in this branch: gate only the mutation steps -- rolling preview release update and Homebrew tap update -- on a positive CI conclusion for that exact SHA.
Done in this branch: remove the duplicate Linux jackin-role validator builds from CI. Preview already builds and packages jackin-role in the signed jackin archives for every release target, and publish-preview still waits for both those archive jobs and the matching successful CI run before mutating the rolling preview.
Done in this branch: keep preview source-path filtering; docs-only pushes should not publish preview binaries.
Done in this branch: add a final SHA ancestry check before publishing, as the preview workflow already does, so a stale or superseded build cannot update the rolling preview.
Consider folding the preview build into the CI DAG instead of keeping it as a separate push-triggered workflow with a CI polling gate. A single workflow that builds the signed archives as CI jobs and gates a final publish-preview job on the full needs set plus push-to-main removes the cross-workflow event latency, shares one checkout and one warm cache, and yields a single required check. The duplicate build-validator job is now gone, so this is no longer needed to avoid building jackin-role twice; it remains a possible cleanup if cross-workflow polling is still slower or harder to reason about than one DAG.

Phase 5 -- Docker and construct-image speedups

Done in this branch: add the BuildKit GHA cache ghtoken/repository parameters to manual Buildx GHA cache refs in CI Docker E2E and construct PR builds, reducing GHA cache API throttling risk without switching to another action.
Done in this branch: the cache-mount experiment for the construct Dockerfile's old cargo registry/git and /sccache-build mounts is superseded. shellfirm is now staged before Docker Buildx, so the Dockerfile no longer has a Rust toolchain, security-tools stage, or cargo cache mounts to preserve.
Keep the registry cache as the primary main-branch warm source and GHA cache as the PR iteration cache. Docker's registry cache backend is the better long-lived multi-stage cache; GHA cache is convenient but eviction- and rate-limit-prone.
Keep per-platform cache refs/scopes. Multi-platform cache writes to one mutable ref are easy to race; the current buildcache-amd64 and buildcache-arm64 shape is the right default.
Track the shellfirm prebuilt-binary TODO for arm64. Avoiding the compile inside the construct image is still the strongest possible speedup for that stage.
Done in this branch: until a prebuilt shellfirm exists for every target architecture, construct CI restores or builds the pinned SHELLFIRM_VERSION once per runner architecture through the GitHub Actions cache, stages it at docker/construct/prebuilt/shellfirm, and lets docker/construct/Dockerfile COPY it in. This removes the Dockerfile's cargo install shellfirm stage and sidesteps the BuildKit cache-mount-persistence question in step 2 for this stage. The upstream arm64 prebuilt TODO in step 5 remains open for eventually replacing the CI-built binary with a direct release-asset download.

Phase 6 -- Docs and prose checks

Keep repo-link-check on every non-schedule event; it is cheap and catches source renames that path filters would otherwise miss.
Keep full docs build plus lychee for docs changes. The analyzed docs run spent 39s building and 87s checking built links, which is acceptable for the coverage it provides.
Done in this branch: make Codebook warm through the prebuilt codebook-lsp mise GitHub backend, remove the old Rust toolchain/Cargo registry cache restores from docs spell-check jobs, and rely on the Docs timing summary to measure the warm result. If warm Codebook remains expensive, split into a fast changed-file PR pass plus a full scheduled/main pass, with docs-required still requiring the full pass when docs/prose actually changed.
Avoid caching node_modules unless measured. Bun's docs recommend bun ci/--frozen-lockfile for reproducible CI; the current Bun download-cache path keeps installs deterministic and small.

Phase 7 -- Release workflow speedups

Done in this branch: share the preview build implementation with tagged release builds through .github/actions/build-release-archive/action.yml, so preview and release use the same cargo-zigbuild, package, SHA256, signing/SBOM/attestation, timings, and artifact-upload contract while keeping their separate version names and publish gates.
Done in this branch: start release artifact builds in parallel with the release test workflow, then gate gh release create, signing publication, and Homebrew stable formula mutation on tests plus all build jobs passing. This preserves release safety while avoiding idle build machines.
Done in this branch: keep final release artifacts on deterministic settings. Archive build jobs keep CARGO_INCREMENTAL=0, pinned toolchain, pinned SDK/tool versions, and unchanged SBOM/signing/attestation, while the shared archive action now normalizes tar entry order, mtime, owner/group metadata, numeric owners, and gzip header names.
Done in this branch: extend sccache to release archive builds using the same release-profile archive action and short-retention stats artifacts as preview, while keeping deterministic release settings and Swatinem/rust-cache in place.
Done in this branch: standardize on cargo-zigbuild-from-Linux for every release target, including macOS, and drop the macos-latest runners from the release critical path. Preview already cross-compiles macOS this way with a cached SDK, so this also collapses preview and release onto one build implementation. Keep a single native-macOS build-and-test in the scheduled hygiene lane for parity, so a macOS-specific link or runtime regression is still caught -- just not on the release critical path. Unify the Swatinem/rust-cache shared-key across the preview and release build jobs so one target triple warms a single cache instead of build-preview-<target> and build-<target> never sharing.

Phase 8 -- Warm persistent-runner fast lane (highest-leverage Rust speedup)

Keep GitHub-hosted runners as the trust boundary for public/fork PRs and as the required parity lane.
Keep the existing velnor lane opt-in through workflow_dispatch and lanes: both; never make it the default for jackin' CI. The win is still structural: a warm target/, warm ~/.cargo, and warm Docker layers mean an incremental Rust change recompiles only the edited crate, which no hosted-runner cache strategy can match.
Done upstream in Velnor: opt-in persistent target/ stores are now scoped under _velnor_targets/<trust-scope>/<repo>/<workflow>/<job-bucket>, so warm build state cannot cross trust scope, repository, workflow, or job-class boundaries, Velnor job containers now receive daemon-level CPU and memory caps, and commit 2013110 adds runtime trust-scope enforcement. trusted daemons keep the full warm-runner capability set; non-trusted scopes reject jobs that carry user/repository secrets such as secrets.*, and job/action containers in those scopes do not receive the shared host Docker socket. Operators still keep distinct VELNOR_TRUST_SCOPE values plus runner labels/groups for trusted and untrusted lanes, but the runner now enforces the boundary instead of relying only on deployment discipline.
Keep a required GitHub-hosted parity job so a green warm-lane run still proves a green cold-lane run, satisfying the PR/main parity rule.
Done in this branch: pair sccache with the lane backend. GitHub-hosted jobs do not enable a compiler-cache backend because the measured GHA backend stayed at 0% hits with write errors; Velnor jobs enable local sccache so the injected SCCACHE_DIR=/var/cache/sccache can use the host-persistent cache and stack with warm target/.
Treat this as the largest single wall-clock lever in the roadmap, gated entirely on the security model in step 3; sequence it after the cheap host-CI cuts (Phase 9, Phase 2) so the hosted lanes are already fast while the persistent lane is hardened.
Hold every change in this roadmap to dual-runner parity capability: it must pass on a GitHub-hosted run by default and on a velnor run when explicitly exercised with lanes: both, with the GitHub-hosted run remaining the required check. Velnor-only turbo (warm target/, local sccache, warm Docker layers) is allowed only as an opt-in accelerator over that shared baseline.
Done in this branch for construct lane-awareness: .github/workflows/construct.yml now has the same lanes dispatch selector, GitHub remains the default and the only manifest-publish source, Velnor can rehearse the amd64 build path, and the native construct arm64 leg remains GitHub-hosted until Velnor has an arm path. Continue improving velnor itself when a job needs a missing capability: give it a hardened Docker daemon for docker-e2e and construct builds, add an arm path if native arm parity is required, and keep extending the warm-state hygiene model that started upstream with repository/workflow/job-class scoping before trusting optional Velnor evidence.

Phase 9 -- Host lint/check critical-path de-serialization

Done in this branch: remove the needs: check-all-features edge from clippy and check-default in .github/workflows/ci.yml so compile-heavy jobs are not serialized behind a cache they do not share.
Done in this branch: delete check-all-features entirely. cargo clippy --workspace --all-targets --all-features runs the full type and borrow check before linting, so it is a strict superset of cargo check --workspace --all-targets --all-features; keeping clippy as the all-features gate and check-default as the default-feature gate preserves coverage while removing a redundant compile-heavy job and its cache.
If the fail-fast-on-broken-compile behavior of the old edge is still wanted to save runner minutes on red PRs, recover it cheaply: let fmt (about 24s) and the direct nextest package shards carry early red signal, or add a single quick cargo check on one core crate rather than gating the whole lint lane behind a full-workspace check.
Done in this branch: capture the lint-lane wall clock before and after per Phase 0. The old serialized lint lane took about 3m56s from cargo check all features start to check-default completion on 27937532691; the de-serialized lane took about 51s from clippy/check-default start to the slower completion on 27961783994. The full CI required gate moved from about 12m33s to about 10m16s, and the remaining critical path is construct E2E image build plus the direct nextest package/Docker fan-out.

Guardrails

Keep one stable aggregator per workflow for branch protection, but allow separate early-signal jobs to finish before the full aggregator.
Do not remove a check unless it is moved to an equal or stronger gate. "Faster" must not mean "main learns later that release cannot build" without an intentional publish gate.
Keep preview and release publish steps hard-gated to main or tag/manual-release rules.
Keep tool installation through mise or a first-party wrapper that is documented in mise.toml; do not add ad hoc language setup actions to workflow files.
Keep cache keys observable. Every new cache must have a documented owner, invalidation input, and expected fallback behavior.
Set timeout-minutes on every job. This branch now applies explicit caps across CI, Docs, construct, preview, release, reusable nextest, Renovate, and scheduled hygiene jobs so a hung network call or wedged process cannot burn to the multi-hour default. A per-job timeout is a cost and safety floor, not a speed change.
Do not gate the cheap deterministic jobs in front of the heavy ones to "fail first". fmt, actionlint, and schema-check already run in parallel and return in seconds; serializing the compile-heavy jobs behind them would add their latency to the green path for no green-path benefit. Fast-fail is a red-path optimization -- keep it off the happy path.
Keep non-critical workflows off the per-push runner pool. Schedule Renovate rather than triggering it on every push so it cannot queue ahead of the jobs that gate merge and publish.
Dual-runner parity capability is mandatory. Every job must stay runnable on both GitHub-hosted runners and the self-hosted velnor lane when a maintainer explicitly selects lanes: both, with the GitHub-hosted run as the default and required parity gate. Velnor may carry heavier optimizations than hosted runners can (warm caches, a local compiler cache), but only as an opt-in accelerator over a baseline that still passes on hosted runners. When Velnor cannot do something a job needs, improve Velnor itself and re-verify both lanes -- do not fork the pipeline or quietly drop the job to hosted-only.

Upstream Notes

GitHub Actions cache searches the current branch first, then restore-key prefixes, then the default branch, which is why broad restore keys can deliberately warm PR branches from main; cache storage can also become read-only when budgets/limits are exhausted. GitHub dependency caching docs
Docker Buildx supports cache-to mode=max to export more layers than mode=min, and Docker documents both registry and GitHub Actions cache backends. Docker also documents that BuildKit cache mounts are not preserved in GHA cache by default. Docker cache backends, Docker GHA cache backend, Docker cache mounts in Actions
Mozilla sccache is a compiler wrapper cache with local and cloud/GHA-style storage backends; Rust usage is through RUSTC_WRAPPER, and Rust compiler caching requires incremental compilation to be disabled. sccache, sccache action Rust notes
Bun documents bun ci as equivalent to bun install --frozen-lockfile for reproducible CI installs from committed bun.lock. Bun install docs
mise's CI docs recommend pinned tool versions for reproducible CI environments, and jdx/mise-action supports install arguments and caching. mise CI docs, jdx/mise-action
mise's GitHub backend installs prebuilt release binaries, so CI tools that ship reliable GitHub releases need not be cargo install-compiled. mise backends
Swatinem/rust-cache supports shared-key, cache-workspace-crates, and rust-environment hashing, which match the current direct nextest package/Docker fan-out design. rust-cache README
cargo-nextest supports build archives and partitioning so a build can be reused while test execution is split across workers. nextest archiving, nextest partitioning
Cargo's profile docs describe incremental compilation, CARGO_INCREMENTAL, and default codegen-unit differences; use those as the basis for any incremental-compilation experiment. Cargo profiles
The self-hosted fast lane is powered by the velnor project; missing runner capabilities are added there so both lanes stay at parity rather than forking pipeline behavior. tailrocks/velnor

Cross-references

/reference/roadmap/rust-ci-tooling/ -- dependency hygiene, Codebook, coverage, and release-time tooling.
/reference/roadmap/workspace-registry-cache/ -- local pull-through Docker registry ideas for runtime workloads.

CI/CD Speed Roadmap

On this page