jackin'
RoadmapCodebase health

Visual snapshot testing (CLI & TUI)

Status: Open — research and design proposal (partially implemented: a thin text-only insta net for four console views plus the component-level SVG lookbook have shipped; the styled, full-surface, SVG-based, PR-diffable program below is unbuilt)

Problem

Every surface jackin' renders is something an operator sees: the operator console, the in-container jackin-capsule multiplexer, the launch-progress TUI, and the plain CLI stdout/stderr of commands like jackin status, jackin doctor, jackin --help, and the friendly E0xx error blocks. All of it is visual, and a one-character change to a layout, a wrong colour token, a dropped BOLD, or a misaligned table column is a real regression an operator will notice. Today the regression net for this is weak and, where it exists, style-blind.

There are two concrete blind spots:

  • The console snapshot helper captures glyphs only. It dumps each cell's .symbol() and discards colour and every text attribute (see crates/jackin-console/src/tui/view/tests.rs). A regression that turns a Blocked badge from red to grey, drops the underline on a link, or removes REVERSED from a selected row passes every existing snapshot.
  • The component SVG encoder captures colour but not attributes. buffer_to_svg in crates/jackin-tui-lookbook/src/svg.rs encodes fg/bg but not the Modifier bitset (BOLD, DIM, ITALIC, UNDERLINED, SLOW_BLINK, RAPID_BLINK, REVERSED, HIDDEN, CROSSED_OUT). The theme defines BOLD_WHITE, BOLD_GREEN, and DANGER (bold) in crates/jackin-tui/src/theme.rs, and the bold-ness of all of them is currently untested.

The goal is a single, deterministic, PR-diffable snapshot of what is on screen — for both TUI screens and CLI output — so that every visual change is surfaced as a reviewable before/after in the pull request, and an unintended change fails CI.

This page is the canonical home for visual / styled-snapshot regression testing of rendered output. It absorbs the former Snapshot tests for TUI render item; the shared test-harness crate is owned by Test infrastructure & behavioral specs, the CI wiring and coverage are owned by Rust CI tooling & dependency hygiene, and live PTY read/wait/send automation is owned by Terminal observation and automation. The boundaries between those items and this one are spelled out under Related work below.

Goals and non-goals

Goals. Capture the complete styled cell (glyph + foreground + background + underline colour + all modifiers) for every rendered surface; store goldens as deterministic SVG; surface every visual change as a before/after diff in the pull request; gate drift in CI; keep the whole pipeline Rust-only.

Non-goals. This is not live terminal automation (driving a running agent session, waiting for visible text, injecting input) — that is Terminal observation and automation. It is not functional CLI testing (exit codes, behaviour, "must contain X") — that stays with the existing assert_cmd / predicates tests and the golden stdout/stderr work tracked under CI tooling. It is not a hosted visual-review SaaS; the review surface is the SVG diff GitHub already renders in the PR.

Why SVG, not PNG or any raster format

SVG is the right golden format for this work, and the decision is deliberate:

  • Deterministic and text-based. The same render produces byte-identical SVG every time, so a committed golden is stable and a mismatch means the render changed, not the rasteriser, font hinting, or anti-aliasing.
  • Semantic diffs. A change shows up as a changed attribute — a fill="#ff0000", a font-weight="bold", a text-decoration="underline", a moved x/y — which is human-readable in the diff. A raster diff only tells you "some pixels changed" and cannot say which property or why.
  • GitHub renders SVG inline. A committed .svg shows in the PR file view, so reviewers see the actual before/after image and the underlying attribute diff in the same place. This is precisely the "compare in the pull request" experience we want, with no external service.
  • Scalable and small. SVG is vector, so it is crisp at any zoom and the files stay diff-friendly, unlike binary PNGs that bloat the repo and force Git LFS.

PNG / raster goldens are explicitly rejected: every pixel can shift with the smallest rendering difference, diffs are opaque, and there is no semantic signal about what changed. SVG is the format for the whole program.

SVG goldens must still be made deterministic at the source: a fixed render size, a fixed theme, a frozen clock for any time-derived content (timers, durations, "started 3m ago"), and redaction of paths, versions, and run IDs. A deterministic input is what makes the byte-level SVG comparison trustworthy.

What "visual" means here — capture the full styled cell

The single source of truth for "what the user sees" in a ratatui surface is the Buffer of Cells, where each cell carries symbol, fg, bg, underline_color, and a Modifier bitset. The live in-container terminal has the equivalent in jackin-term's GridSnapshot (SnapCell already records fg, bg, bold, italic, underline, inverse, dim — see crates/jackin-term/src/snapshot.rs). The canonical artifact for this program must encode all of it, and the SVG encoder must map every modifier to an SVG attribute:

ModifierSVG encoding
BOLDfont-weight="bold"
ITALICfont-style="italic"
UNDERLINED (+ underline_color)text-decoration: underline + text-decoration-color
CROSSED_OUTtext-decoration: line-through
DIMopacity="0.5" (or blend foreground toward background)
REVERSEDswap fg/bg before emitting
HIDDENforeground = background
SLOW_BLINK / RAPID_BLINKstatic marker class / data- attribute so the diff still flips

Two further classes of test pin intent independently of pixel output, so a "wrong token used" regression fails even when two tokens look similar:

  • Palette golden. A single snapshot of every pub const colour/style token in crates/jackin-tui/src/theme.rs. Changing STATUS_BLOCKED_RED_RGB then shows as one intentional, reviewable diff instead of rippling silently through dozens of screen goldens.
  • Semantic style assertions. At known coordinates, assert the resolved style of high-value cells (the Blocked badge resolves to STATUS_BLOCKED_RED + BOLD; a link has UNDERLINED), separating "the token's value changed" from "the wrong token was applied".

Surfaces to cover

SurfaceSourceStatus today
Shared components (jackin-tui)buffer_to_svg over a TestBackend bufferSVG lookbook with a --check drift gate shipped (colour only)
Composite TUI screens (console list/editor/settings, capsule chrome/pane/branch bar, launch TUI)ratatui Buffertext-only insta for four console views; no full-screen SVG
Live in-container session screensjackin-term GridSnapshotsnapshot model exists; few committed fixtures
CLI stdout / stderr (help, status, doctor, --format json, E0xx errors)captured process output (with ANSI)functional assert_cmd checks only; no visual goldens

Tool research — Rust-only options

The hard constraint is Rust-native tooling: the whole application is Rust, and we do not want to pull a Node or Go toolchain into the test path. The candidates — SVG/screenshot emitters, snapshot stores, and CLI test harnesses:

ToolLanguage / licenceWhat it doesFit for jackin'
Own buffer_to_svg (in the lookbook)Rust (this repo)Renders a ratatui Buffer directly to SVGBest for TUI. Works on the styled buffer, so cursor-addressed full-screen renders are exact. Needs to become modifier-complete and be extracted into a reusable crate.
term-transcriptRust, Apache-2.0 / MITCaptures CLI/REPL output + ANSI colour, writes and parses SVG, tests that a parsed transcript matches outputGood for CLI line output. Purpose-built for exactly the stdout/stderr case and licence-compatible. Limitation: only SGR ANSI is kept; CSI cursor and OSC sequences are dropped (the optional portable-pty feature does not change this — confirmed in its v0.4.0 docs), so it is not a fit for full-screen TUI. Adopt-or-borrow for the CLI surface.
snapbox / trycmdRust, MIT / Apache-2.0CLI snapshot harness (Ed Page / assert-rs): stdout/stderr/exit-code with redactionsGood for unstyled CLI assertions--help, --format json, exit codes — where a styled image adds nothing. Actively maintained (≈8.78M all-time downloads, latest v1.2.2 on 2026-05-26). Captures text, not the styled cell, so it complements the SVG path rather than replacing it.
termframeRust, MITRuns a command and exports its output as an SVG screenshot with ANSI styling (bold/italic/underline, 16/256/24-bit), themes, light/darkA screenshot generator, not a test library — and whether it preserves full-screen CSI cursor-addressed output (vs SGR line runs) is unverified. Useful reference for styled-SVG rendering and docs/demo capture; not a regression harness on its own.
cellshotRust, MITPTY capture with a structured cell-frame model; emits text/JSON/ANSI/SVG/PNGAlready the research trigger for Terminal observation and automation; its frame model and block-element SVG rendering are worth borrowing, but it is a PTY daemon, not a render-regression library.
aggRust, Apache-2.0Renders asciinema .cast recordings to animated GIFAnimated, raster output — useful for session recordings, not deterministic static goldens.
ratatui TestBackendRust (dependency)In-memory styled-cell buffer + assert_bufferAlready used widely; it is the source the SVG encoder renders from, and its styled assert_buffer is the right primitive for tight unit-level style checks.
insta (+ insta-cmd)Rust, Apache-2.0 / MITSnapshot storage + cargo insta review accept workflow; stores any Display/string verbatim in plain-text .snap files; regex filters, redaction selectors, binary snapshotsThe golden store for every surface. The dominant Rust snapshot crate by a wide margin (≈70M all-time / ≈17.7M recent downloads, ≈2.9k GitHub stars, latest v1.47.2 on 2026-03-30). An SVG string is stored verbatim, so all four surfaces reuse one review loop; its filters and redactions are the determinism knobs for paths, versions, and durations.
expect-testRust, MIT / Apache-2.0rust-analyzer's in-house inline + file snapshot lib (expect!, expect_file!, UPDATE_EXPECT)The main alternative to insta, deliberately lighter (no external review tool). Second by usage but well behind (≈1.8M/mo vs insta's ≈6.6M/mo) and without SVG/redaction tooling — insta is the better store here.

Excluded from the test path because they are not Rust: Charmbracelet VHS and freeze (Go), Microsoft tui-test (Node/xterm.js), termtosvg (Python, archived). VHS is doubly unfit for this work — it is a ttyd + ffmpeg-driven recorder that emits only raster/recording formats (GIF/MP4/WebM/PNG), never SVG. They remain useful as references and, at most, as optional docs/demo tooling.

Multi-source, adversarially-verified external research (primary sources: crates.io, GitHub, lib.rs, docs.rs; captured 2026-06) confirms this build-vs-buy split. insta is the dominant Rust snapshot store and holds an SVG string verbatim. No off-the-shelf Rust crate does full-screen styled TUI → SVG render-regression: the one tool that pairs SVG output with a snapshot harness, term-transcript, drops CSI by design, and the Go generators (VHS, freeze) cannot sit in a Rust-only test path. The smallest correct path is therefore a first-party Buffer / GridSnapshot → SVG encoder feeding insta. Absolute download and star counts drift; the ranking — instaexpect-test, with snapbox / trycmd the active choice for unstyled CLI output — is durable.

Decision

  • TUI, composite, and live surfaces: render the styled Buffer / GridSnapshot to SVG with our own encoder. This is the correct layer because ANSI-stream tools drop cursor/CSI movement and cannot faithfully capture a full-screen cursor-addressed TUI. ratatui's own default TestBackend text dump is itself style-blind — it drops colour and modifiers (ratatui issue #1402) — so the encoder must read the Buffer cells directly rather than reuse that path. Make buffer_to_svg modifier-complete and extract it into a small reusable crate (the natural home is alongside the jackin-test-support crate from Test infrastructure & behavioral specs, or a focused jackin-term-svg crate) so the console, capsule, launch TUI, and lookbook all emit one identical artifact format.
  • CLI line output: capture stdout/stderr (with ANSI) and render to the same SVG. Evaluate term-transcript as adopt-or-borrow for this surface; if its SGR-only limitation or its rendering shape does not fit, feed the captured bytes through a small ANSI→cell parser into the same SVG encoder, so every surface in the project shares one golden format. For purely functional CLI checks — exit codes, --format json, must-contain assertions — where style is not what is under test, prefer snapbox / trycmd over hand-rolled assertions and skip the SVG entirely; reserve the styled-SVG golden for output whose appearance is the regression target (the E0xx error blocks, the coloured status and doctor summaries).
  • Build our own crate where needed. No off-the-shelf Rust tool covers full-screen TUI render regression; term-transcript is CLI-only and termframe/cellshot are screenshot/PTY tools, not render-regression libraries. The smallest correct path is to harden and extract the encoder we already have, reusing term-transcript for the CLI surface where it fits and borrowing SVG-rendering ideas from termframe/cellshot. The result is mostly first-party, fully Rust.
  • Review and gate in the PR. Commit .svg goldens; gate drift with the existing lookbook --check pattern extended across all surfaces (and/or insta review for the same SVG strings). GitHub renders the committed SVG inline, so the PR shows the before/after image and the attribute-level diff together. An optional static gallery page (the lookbook already exports one) gives a Storybook-style browse surface.
  • Determinism harness first. Fixed render sizes, a fixed resolved theme (resolve Color::Reset/named colours to a reference palette so goldens reflect what the user sees, not the CI terminal), a frozen clock, and redaction of paths/versions/durations/run IDs. This harness is shared with the CLI golden work and lives in jackin-test-support. The encoder's cell geometry must be deterministic and independent of any host font: bundle a fixed monospace advance-width metric, or pin a vendored font parsed with a crate such as ttf-parser / swash, so the emitted x / y coordinates are byte-identical on every CI machine — a system-font lookup would make goldens host-dependent.

Current state (absorbed from prior items)

  • Component SVG lookbookjackin-tui-lookbook renders every shared component to SVG via buffer_to_svg, with --check drift detection and an exported gallery under docs/public/tui-lookbook. Colour is captured; modifiers are not yet.
  • Text-only console snapshots — four insta snapshots (list, settings, editor-general, editor-mounts) in crates/jackin-console/src/tui/view/tests.rs with committed .snap files, generated with INSTA_UPDATE=new. These capture glyphs only.
  • Capsule render testsinsta and buffer assertions exist for capsule chrome and the branch-context bar.
  • Live screen modeljackin-term's GridSnapshot::dump() already serialises a complete styled screen, ready to be a golden source for in-container session fixtures.

Phases

  1. Style-complete the encoder. Extend buffer_to_svg to emit every modifier (table above) and switch the console snapshot helper off .symbol() onto a style-complete encoding. Prove it by deliberately dropping a BOLD/colour and showing the golden fails.
  2. Extract the shared crate + intent tests. Move the encoder into a reusable crate; add the palette golden and semantic style assertions; build the determinism harness (fixed size/theme/clock, redaction).
  3. Composite full-screen SVG goldens. Render the console list/editor/settings, capsule chrome/pane/branch bar, and the launch TUI at fixed sizes (e.g. 80×24, 110×30, 120×40), both light and dark themes, gated by --check.
  4. CLI output → SVG goldens. Capture and render --help, every subcommand help, status, doctor, --format json, and each E0xx error to SVG via the chosen CLI path (adopt-or-own). Keep functional assert_cmd checks for behaviour.
  5. Live-session goldens and review polish. Expand GridSnapshot runtime fixtures; optionally export asciinema .cast for full-session replay; refine the PR before/after experience and the gallery.

Tradeoffs and risks

  • Golden churn. Style-complete SVG goldens change whenever the render legitimately changes; the insta/--check review workflow keeps that a one-keystroke accept, but reviewers must actually look at the diff, not rubber-stamp it.
  • Determinism is load-bearing. Any unpinned size, theme, clock, or path makes a golden flaky. The harness must land before broad adoption (Phase 2 gates Phases 3–5).
  • Encoder fidelity. The SVG encoder must apply the same colour resolution and dim/scaling transform the real renderer uses (see the scale() path in crates/jackin-tui/src/theme.rs), or goldens will diverge from on-screen reality.
  • Scope discipline. This item owns the visual render artifact and its regression net. It must not absorb live-automation or functional-CLI concerns; cross-reference the owning items instead.

Open design questions

These are unresolved and should be settled in Phase 2 (the harness), before broad golden adoption:

  • Deterministic cell geometry. The encoder emits per-cell x / y coordinates, so glyph advance width must come from a fixed source — a bundled monospace metrics constant, or a vendored font parsed with ttf-parser / swash / fontdue — never a system-font lookup, or goldens diverge across CI hosts. Pick the mechanism before Phase 3.
  • Encoder input contract. The encoder should take a styled cell grid, not raw bytes: the console and launch TUI hand it a ratatui Buffer; the live container surface hands it a jackin-term GridSnapshot. Any raw-ANSI input (captured CLI bytes) must first be parsed into that grid by a VT crate (vt100 / vte / avt) so all four surfaces converge on one encoder.
  • Borrow vs reference for termframe / cellshot. Both are Rust and MIT, but neither is verified as a full-screen-CSI-faithful regression harness; decide per-tool whether to lift SVG-rendering code or keep them as design references only.
  • One harness home. Confirm the determinism bundle (fixed size, frozen clock, resolved theme, insta filters/redactions) lives once in the shared test crate so the console, capsule, launch TUI, and CLI goldens cannot drift apart.
  • Test infrastructure & behavioral specs — owns the shared jackin-test-support crate and the determinism harness this item builds on; this item supplies the styled SVG artifact and the visual assertions.
  • Rust CI tooling & dependency hygiene — owns CI wiring, coverage, and the insta dev-dependency; the golden-comparison steps slot into its aggregators.
  • Terminal observation and automation — owns live PTY read/wait/send automation of running sessions; complementary, not overlapping: that item captures live sessions for orchestration, this one asserts deterministic render regressions. The two should share the styled-cell frame schema.
  • Brand identity system — defines the colour and wordmark contract these visual tests defend.
  • Launch progress TUI — its renderer snapshots should use this item's harness.

On this page