Skip to content

Session Snapshot and Rollback

Status: Open — design proposal (Containment track, Agent Orchestrator Research Program)

Per-mount isolation and worktree cleanup protect many source-control flows, but they do not fully answer recovery for every agent session:

  • a workspace may be dirty before launch
  • a mount may not be a Git repository
  • generated files and caches may be expensive or unsafe to roll back manually
  • agent state, auth state, and tool config can be poisoned across sessions
  • future autonomous queues may run unattended long enough that “just inspect the diff” is not enough

Jackin needs a recovery story that is explicit, scoped, and opt-in. It should not become a silent host backup system.

Hazmat takes a pre-session snapshot and exposes restore/diff workflows. Its verification docs also model backup/restore ordering and reversibility invariants. That is stronger than jackin needs for a first pass, but the principle is useful: recovery behavior should be visible before launch and should not rely on the agent behaving.

Docker Sandboxes provides a different reference point: sandbox state persists, and sbx rm removes the sandbox and associated worktrees. That is useful for VM cleanup, but it does not replace project-level rollback when workspace changes are live on the host.

Add a scoped snapshot layer for sessions where rollback matters.

Suggested config shape:

[workspaces.my-project.recovery]
snapshot = "off" # off | metadata | project | project-and-state
exclude = ["target/", "node_modules/", ".next/"]

Launch output must print:

  • whether a snapshot will be created
  • which paths are covered
  • which excludes apply
  • where metadata is stored
  • whether restore is available
  • which host paths would be overwritten by restore
  • Record resolved mounts, isolated worktree paths, base commits, dirty status, and cleanup policy in the persistent storage layer.
  • Add jackin recovery show <instance> for post-session inspection.
  • Do not copy file contents yet.
  • Add opt-in snapshots for non-git mounts and dirty working trees.
  • Use integration-provided excludes from stack integration contracts.
  • Store snapshots under jackin’s data directory, not inside the project.
  • Add diff/list commands before restore.
  • Snapshot selected jackin-managed state such as agent config, plugin state, and auth delivery artifacts when the selected auth mode makes that safe.
  • Do not snapshot secret values into a broader-readable archive.
  • Integrate with container credential exposure so secret handling stays explicit.

Phase 4 — Remote and Kubernetes recovery

Section titled “Phase 4 — Remote and Kubernetes recovery”
  • Map the recovery contract onto SSH remote directories and Kubernetes volumes.
  • Prefer platform-native snapshots where available, but keep the same operator-facing contract.

Restore must be destructive only by explicit operator confirmation.

Minimum restore flow:

Terminal window
jackin recovery show <instance>
jackin recovery diff <instance>
jackin recovery restore <instance>

The restore command should refuse to run while the instance is active. It should print the exact host paths it will mutate and require an explicit flag or interactive confirmation.

Before any destructive restore, jackin should snapshot the current state of the paths it is about to overwrite. That emergency snapshot is not a replacement for a real backup system, but it preserves a last-known state if the operator selected the wrong instance, the restore target changed since launch, or the snapshot excludes were too broad.

Snapshots write to jackin’s data directory. Restore mutates host workspace paths and must therefore be opt-in, visible in the session contract, and guarded by interactive confirmation or an explicit destructive flag.