Test Infrastructure & Behavioral Specs

Status: Open — design proposal

Problem

Three testing gaps block the codebase readability program and increase the risk of silent regressions:

Test file duplication mirrors source duplication. tests/manager_flow.rs is 3694 lines in one file. There are 4 byte-for-byte-identical config_with_agents helpers across 4 test files, 3 independent FakeRunner definitions (plus a 4th ScriptedRunner), 10 independent mount-helper definitions in 7 files, 45+ copy-pasted role-seeding blocks across 10 files, and 20 inline ResolvedWorkspace constructions across 5 files. This blocks the Phase 2 source-file splits because splitting a 7000-line source file means rewriting the 3000-line test file that depends on it.
No behavioral specs for load-bearing components. The codebase readability roadmap plans a behavioral spec for runtime/launch.rs, but not for jackin-capsule's daemon (crates/jackin-capsule/src/daemon.rs — 8457 lines, the single biggest file in the project, runs as PID 1 in every container). Behavioral specs should be a general practice for all major components.
No property-based or fuzz testing for parser surfaces. crates/jackin-manifest/src/validate.rs (1244 LOC), crates/jackin-env/src/env_resolver.rs, and the migration chain (crates/jackin-config/src/migrations.rs, crates/jackin-manifest/src/migrations.rs) are parsers of operator/role-author-supplied input, tested only by example-based unit tests. An input the developer didn't think of isn't covered. The migration framework's fixture chain is one of the project's strongest correctness surfaces, but it's still a closed-world test.

Proposal

1. Test file split + mock consolidation

Research established Rust patterns for test-support crates (tokio, bevy, cargo, and other prominent projects all maintain dedicated test-support crates), then:

Phase 1 — Extract jackin-test-support dev-dep crate. Promote FakeRunner, ScriptedRunner, seed_minimal_role_repo, simple_mount, config_with_agents, test_workspace, fake_runner_with_running into one crate under crates/jackin-test-support/. Update every test to import from it. Estimated net delete: 500–1000 LOC.

Phase 2 — Split mega-test-files. tests/manager_flow.rs → 8–10 files by scenario family. tests/dind_e2e.rs → smaller end-to-end groups. Companion to each source-file Phase 2 split.

Phase 3 — Snapshot/golden-file infrastructure. Wire cargo insta (from the CI tooling roadmap item) into the test-support crate so every JSON output, derived Dockerfile, and friendly error block can be snapshotted with one helper, and host the shared determinism harness (fixed size/theme/clock, redaction). The styled-SVG visual goldens for TUI screens and CLI output build on this crate but are designed in Visual snapshot testing (CLI & TUI); that item owns the artifact format, this item owns the shared jackin-test-support crate it lives in.

2. Behavioral specs for major components

Write behavioral specification documents for every major component, not just runtime/launch.rs. Priority order:

Capsule daemon (crates/jackin-capsule/src/daemon.rs) — PID 1 contract, single-attach-client invariant, control channel dispatch, session lifecycle, PTY mutex poison recovery, attach framing, OSC passthrough, mode-state restore, sessions persistence and reattach.
Operator console — the TUI state machine: workspace selection → role selection → agent selection → instance lifecycle → session management, keybinding dispatch, dialog stack.
Launch pipeline (crates/jackin-runtime/src/runtime/launch.rs) — already roadmap'd in behavioral spec for runtime launch, but worth confirming coverage.
Auth forwarding (crates/jackin-runtime/src/instance/auth.rs) — credential provisioning, symlink rejection, refresh flows.

Each spec should: list invariants, give the failure mode for each violation, and link to the source line where the invariant is enforced.

3. Property and fuzz tests for parser surfaces

Add two testing layers:

proptest for invariant testing:

env_resolver: ${env.VAR} interpolation is associative, idempotent on no-op substitutions, escapes correctly for every Unicode input.
manifest::validate: a manifest that parses successfully always round-trips through serialize → parse unchanged.
Migration chain: any input that parses at version N migrates successfully to N+1 and the resulting output parses at N+1.

cargo-fuzz targets:

manifest::validate (TOML input, expect no panic).
env_resolver::resolve (any Unicode env string).
Each migration step (any input that parses at version N).

Schedule cargo fuzz run <target> -- -max_total_time=300 as a nightly CI job (not on every PR — too slow).

Non-goals

Do not block the readability program's Phase 1 work on the test-support crate. Phase 1 splits don't need it yet.
Do not add mutation testing yet. Re-evaluate once cargo-llvm-cov (from the CI tooling roadmap item) shows where coverage is weakest.
Do not fuzz the TUI rendering layer or the Docker client — those surfaces change too fast for fuzzing to be cost-effective.

Implementation Phases

Phase 1 — Research + test-support crate

Audit: walk every tests/*.rs and inventory helpers + mock structs.
Research how prominent Rust projects structure their test-support crates.
Create crates/jackin-test-support/ with a Cargo.toml declaring dev-dependencies only from consumers' perspective (publish = false).
Promote one helper family at a time (mounts first — smallest, used everywhere).
Split tests/manager_flow.rs last, after helpers are extracted.

Phase 2 — Behavioral specs

Write capsule daemon behavioral spec.
Write console behavioral spec.
Confirm launch pipeline spec coverage.

Phase 3 — Property and fuzz tests

Add proptest to dev-dependencies.
Add fuzz/ directory with cargo-fuzz setup; one target per parser.
Schedule nightly CI fuzz run.

Open Questions

Which exact Rust projects should we study for test-support crate patterns? (tokio, bevy, cargo, and clap are candidates.)
Should the test-support crate live in crates/ or alongside the test files?
Should behavioral specs live in the roadmap or in contributor-facing reference docs?
What is the right nightly CI schedule for fuzz runs?

crates/jackin/tests/manager_flow.rs — 3694-line test file
crates/jackin/tests/dind_e2e.rs — 1094-line E2E test
crates/jackin-capsule/src/daemon.rs — 8457-line capsule daemon
crates/jackin-manifest/src/validate.rs — 1244-line manifest validation
crates/jackin-env/src/env_resolver.rs — env var interpolation
crates/jackin-config/src/migrations.rs — config migration chain
crates/jackin-manifest/src/migrations.rs — manifest migration chain

Cross-references

Codebase map — workspace crate structure that the behavioral specs will cover
Rust CI tooling & dependency hygiene — cargo insta for snapshot tests, cargo-llvm-cov for coverage
Behavioral spec for runtime launch — existing spec item that this extends to all components
AgentRuntime and Provider registry — adapter registration reduces the per-agent test matrix

Test Infrastructure & Behavioral Specs

On this page