Agent Orchestrator Research Program

Status: Open — design program (fleet-operations track plus containment/recovery track)

Goal

Make jackin’ the canonical, terminal-first orchestrator for engineers who run autonomous coding agents in real working environments: local terminals, SSH sessions, disposable servers, Kubernetes debug pods, and long-running isolated workspaces. The product target is not a consumer desktop app. It is the operator surface for experienced engineers who want agents at full speed but do not want that speed pointed directly at their host account, production credentials, or shared Docker daemon.

This page is the research index for adjacent tools that are solving parts of the same problem:

multicode is the strongest reference for fleet operations: many parallel workspaces, live status, GitHub link state, persistent task state, resource telemetry, remote operation, and custom operator tools.
Hazmat is the strongest reference for local containment discipline: explicit session contracts, tiered threat decisions, macOS user isolation, seatbelt policies, pf firewalling, credential-deny rules, stack integrations that cannot widen trust, and rollback-oriented design.
Docker Sandboxes is the strongest commercial benchmark for microVM sandboxing: per-sandbox VM boundary, private Docker daemon, scoped workspace sharing, host-side network policy, and credential proxying.
Conductor, Claude devcontainers, Trail of Bits’ devcontainer, and private internal tools remain useful comparison points, but they are not the center of this program.

The program keeps two ideas separate:

Fleet operations: how the operator coordinates many agents and tasks.
Containment and recovery: what an autonomous agent can reach, how that boundary is explained, and how the operator recovers when the agent does something destructive.

That wider scope is why this page lives at agent-orchestrator-research, not under a multicode-specific route.

Jackin values used for evaluation

Every borrowed idea has to survive these filters:

Value	What it means for roadmap decisions
Terminal-first	The CLI and `jackin console` are primary surfaces. Desktop-only workflows are comparison material, not the center.
Isolation before convenience	No hidden host mutation, host socket exposure, or credential widening just to make an agent feel seamless.
Runtime-neutral agents	Claude, Codex, Amp, OpenCode, Gemini, and future runtimes should plug into the same operator model.
Role repos over ad-hoc setup	Toolchains belong in roles where possible. Project-local hints may improve ergonomics, but they must not become policy escapes.
Explicit contracts	Before launch, the operator should be able to see mounts, credentials, network, Docker access, host-side effects, and recovery posture.
Real engineering environments	The model must eventually work for SSH, servers, Kubernetes, and repos with serious Docker/Compose needs.

Research map

Tool	Strongest idea to borrow	Main reason not to copy it directly
`graemerocher/multicode`	Parallel workspace table, GitHub status links, `multicode-remote`, skill-driven tags, Codex provider support, editor launcher, autonomous queue config, Apple-container experiments	Linux-first `bwrap`/`systemd-run` core; isolation is convenience, not a strong boundary; some Apple-container workflows still mount the host Docker socket
`dredozubov/hazmat`	Session contract, tier decision flow, native macOS containment, strict integration rules, rollback/proof discipline	macOS-only; one hardcoded `agent` user; no smooth multi-agent fleet surface
Docker Sandboxes	MicroVM plus private Docker daemon, host-side network proxy, credential injection outside the VM, branch worktrees under `.sbx/`	Product is tied to Docker Desktop; not an extensible jackin role ecosystem; credentials/network proxy are not available primitives in plain Docker
Claude/devcontainer patterns	Default-deny firewall and reproducible container setup	Devcontainers are workspace setup, not a multi-agent orchestrator or full operator platform
Conductor-style native worktrees	Low-friction host-native worktrees	No meaningful sandbox boundary; useful only as a UX comparison

User-facing benefit matrix

This table is the “why would an operator care?” pass. A feature only belongs on the roadmap when it improves an engineer’s day-to-day control loop, not just because another tool has it.

User-visible benefit	Seen in	Jackin direction
Know the exact boundary before launch	Hazmat session contract; Docker Sandboxes security model	Session contract and explain mode becomes the common preflight for mounts, auth, Docker, network, ports, persistence, and recovery
Pick the right containment tier for the job	Hazmat tier decision flow; Docker Sandboxes private-daemon path	Add a tier recommendation to `jackin explain` rather than forcing users to infer whether `dind`, rootless DinD, microVM, SSH remote, or Kubernetes is appropriate
Run Docker/Compose without trusting the host daemon	Docker Sandboxes private Docker Engine; Hazmat Tier 3	Keep host socket mounting out of scope; make private-daemon backends the Docker-capable path under selectable sandbox backends
See and govern outbound network behavior	Docker Sandboxes dashboard network panel; Hazmat deny-mode routing	Network egress policy should include connection logs and rule-editing UX, not only static allowlist config
Open services started by the agent	Docker Sandboxes `sbx ports`; multicode host-localhost notes	Track service exposure as a first-class session-contract section with explicit host-side effects and non-persistent port mappings
Work in parallel without clobbering the main checkout	Docker Sandboxes `--branch`; multicode short-lived workspaces	Jackin already has per-mount worktree/clone isolation; improve the operator surface around branch naming, preserved worktrees, and post-session review
Resume a configured environment without rebuilding everything	Docker Sandboxes named/persistent VMs; multicode workspaces	Preserve jackin’s explicit state model, but add disk/resource visibility and cleanup policy so persistence does not become invisible bloat
See which agents are idle, busy, waiting, or expensive	multicode live TUI; Docker Sandboxes dashboard cards	Agent runtime status, console resource panel, and token telemetry should converge into one operator overview
Jump from agent output to issue/PR/repo state	multicode `<multicode:*>` tags	Keep the tag protocol vendor-neutral and optional, then use it to drive GitHub link tracking and custom operator actions
Reuse stack setup safely	Hazmat integrations; Docker Sandboxes kits/templates	Keep roles as the environment unit, but allow non-executable stack hints and future reviewed setup kits where they cannot widen trust
Roll back after a bad unattended run	Hazmat snapshots/restore; Docker Sandboxes disposable VM cleanup	Session snapshot and rollback should separate metadata recovery, project snapshots, and sandbox-state deletion
Run the same operator model over SSH/server/Kubernetes	multicode-remote; Jackin roadmap vision	jackin-remote and Kubernetes support should reuse the same contract/status/policy vocabulary instead of becoming separate products

Ideas to decline or postpone

Some adjacent-tool features are attractive but would cut against jackin’s values if copied directly.

Idea	Why not copy directly	Safer jackin alternative
Host Docker socket passthrough	Any agent with daemon access can create privileged containers or host bind mounts; this collapses the sandbox boundary	Private daemon only: DinD, rootless DinD, microVM-owned Docker, or Kubernetes-controlled pods
Same-absolute-path workspace passthrough everywhere	Elegant in microVMs, but it would break jackin’s explicit `dst` mount model in Docker/container backends	Keep `dst` explicit; only use same-path passthrough inside a backend that natively needs and explains it
TUI-only orchestration	Fast for one product, but jackin’s core users automate from terminals, SSH, and scripts	CLI-first contracts and commands, with `jackin console` as the common day-to-day overview
Repo-controlled integration manifests with arbitrary paths/hooks	Turns untrusted project files into policy authority	Repo may recommend known integration names; operator approval and jackin-owned manifests decide what activates
Broad environment inheritance for convenience	Secret-shaped env vars and auth sockets are easy to leak into agents	Named credential sources, safe env selectors, and bridge/proxy-mediated capabilities
”Sandbox equals safe” messaging	Docker Sandboxes and Hazmat both document residual workspace, hooks, network, and persistence risk	Print backend-specific risk posture and recovery limits in the session contract

Critical read of Hazmat

Hazmat is not a jackin replacement. It is a containment-first launcher that treats “what can the agent reach?” as the product question. That makes it a high-signal reference for jackin’s security and trust surface.

What Hazmat gets right

The session contract is first-class. A launch is not just “agent started”; it prints the selected mode, read-write project, read-only extensions, service access, snapshot state, and integration-derived behavior. The operator can also preview with hazmat explain.
Docker is treated as a boundary change. Hazmat does not punch a hole from native containment into the host Docker daemon. Private-daemon Docker workflows move to a Docker Sandbox/microVM tier; shared-daemon workflows are either code-only or pushed to a full-VM answer.
Stack integrations are constrained. Integrations can add read-only toolchain/cache paths, snapshot excludes, safe env selectors, warnings, and command hints. They cannot widen write scope, inject credentials, change network policy, or execute arbitrary hooks.
Credential delivery is modeled as capability delivery. Credentials live in a host-owned store and are materialized or brokered only for the selected harness/session. Hazmat is explicit about residual MCP/env inheritance risk.
Recovery and proof boundaries are honest. The TLA+ verification page names exactly which setup, seatbelt, backup/restore, and launch invariants are governed. The proof found ordering bugs, which is exactly the kind of failure mode jackin should care about.

Where Hazmat does not match jackin

It is macOS-native and intentionally platform-specific. Jackin needs the same operator concepts across macOS, Linux, WSL, servers, and eventually Kubernetes.
It is single-operator/single-agent-user shaped. Jackin’s product value is many roles, many agents, many concurrent instances, and per-instance state.
It does not provide a fleet operations plane comparable to multicode’s live table, task queue, GitHub polling, or remote bridge.
Its strongest no-VM path depends on macOS user isolation and seatbelt. That is useful inspiration, not a portable backend.

Hazmat ideas worth turning into jackin roadmap items

Candidate	Jackin-shaped version	Roadmap item
Session contract / `explain`	Preview a fully resolved launch boundary before side effects; print the same contract at launch	Session contract and explain mode
Stack integrations	Optional, non-executable workspace hints for read-only toolchain/cache mounts, safe env selectors, warnings, and excludes	Stack integration contracts
Docker tier decision	Treat host Docker daemon access as a policy boundary, not a convenience default	Selectable sandbox backends, network egress policy
Snapshot/rollback	Pre-session snapshots for dirty/non-git/long-autonomy work, with opt-in restore and visible host-side effects	Session snapshot and rollback
Proof/governance boundary	Model launch/finalization ordering where host-side effects become complex	Architecture decision records, behavioral runtime/launch spec

Critical read of Docker Sandboxes

Docker Sandboxes is the benchmark because it solves several hard problems at the same time: VM boundary, private Docker daemon, scoped workspace, network policy, and credential injection. Jackin should use it as a comparison bar, not as an assumption that every backend can match immediately.

What Docker Sandboxes gets right

Private Docker is the default inside the boundary. The agent can run Docker without touching the host daemon.
Network is host-mediated. HTTP/HTTPS traffic goes through a host proxy, non-HTTP protocols are blocked, and policy is domain-based.
Credentials do not enter the VM. The host proxy injects auth headers. That is materially stronger than env vars or mounted secret files.
Branch mode is operator-friendly. --branch creates worktrees under .sbx/, keeping agent changes out of the main working tree while Git still works inside the sandbox.
The dashboard makes state legible. sbx shows live sandbox status, CPU/RAM use, port mappings, and network governance rules. The important borrow is not the card UI; it is one operator overview for runtime, ports, policy, and cleanup.
Ports and host services are explicit. Host-to-sandbox services require published port mappings, and sandbox-to-host service access goes through a named host alias plus policy. Jackin needs equivalent explicitness for dev servers, databases, and local model runners.
The security docs are blunt about remaining risk. Workspace changes are live on the host in direct mode; hooks and generated files still matter.

Where jackin should differ

Jackin should keep role repositories as the unit of runtime distribution. Docker Sandboxes templates are useful, but they are not a replacement for jackin roles.
Jackin should preserve mount destination control. Docker Sandboxes’ same-absolute-path passthrough is elegant for worktrees, but jackin’s dst-based workspace model is more explicit and portable.
Jackin should treat Docker Sandboxes’ credential proxy as a long-term target, not a first-phase promise. The current container credential work already documents why env injection is weaker.
Jackin should make backend differences visible. A dind session, an OrbStack isolated-machine session, and a future Docker Sandboxes-style microVM session should not pretend to have identical risk.

Track A — Fleet operations comparison: jackin’ vs multicode

This is the feature inventory the fleet track is built from. Have means present in jackin’ today; planned means a roadmap item already exists; gap means truly missing. Items in italics are addressed by this program.

Concern	multicode	jackin’ status
Agent isolation	`bwrap` + `systemd-run`; Apple `container`	Have — Docker + DinD; planned — selectable backends
Per-agent working tree	Workspace dir; one repo per workspace	Have — worktree and clone isolation
Mount kinds beyond `shared`/`readonly`	`writable`, `readable`, `isolated`, `tmpfs`	Partial — `shared` + `readonly`; gap on `tmpfs` and ephemeral, see Ephemeral mount modes
Multi-provider agent runtime	OpenCode + Codex (one per session)	Partial — basic built-in runtime launch shipped; parity work remains in multi-runtime
Resource limits per agent	`memory-high`, `memory-max`, `cpu` (cgroups)	Gap — see Declarative resource limits
Live agent status (idle/busy/question)	Yes — derived from opencode SSE events	Gap — see Agent runtime status
Live machine resource panel (CPU/RAM/disk)	Yes — `/proc/stat`, `/proc/meminfo`, sampled per 2s	Gap — see Console resource panel
Per-agent token / cost / OOM tracking	Yes — usage aggregation service	Gap — see Token & cost telemetry
Agent to operator tag protocol	Yes — `<multicode:repo>` / `<multicode:issue>` / `<multicode:pr>`	Gap — see Agent tag protocol
Live GitHub link state polling	Yes — octocrab + per-workspace SQLite cache	Gap — see GitHub link tracking
Persistent storage backend	Per-workspace SQLite + per-workspace JSON	Gap — see Persistent storage layer
Per-workspace operator description	Yes — inline-editable note in TUI	Gap — see Workspace description
External tool/IDE/diff launcher	Yes — `[handler]` block, `[[tool]]` array	Gap — see Operator handler system
Custom operator-defined tools	Yes — `[[tool]]` array, hotkey + exec/prompt type	Gap — see Custom operator tools
Autonomous task queue	Yes — scan cadence, parallel issue dispatch, runtime cleanup knobs	Gap — see Autonomous task queue
PR lifecycle actions	Yes — publish, rebase, fix CI, address review, request review, and merge actions	Gap — model manual gates and PR readiness inside Autonomous task queue
Task source abstraction	No — current multicode hardcodes GitHub issue/repo workflow	Gap — add source-neutral task policy in Task source abstraction
Idle runtime cleanup	Yes — `gradle --stop`, container recycle	Gap — see Idle runtime cleanup
Remote orchestration	Yes — `multicode-remote` SSH bridge + rsync	Gap — see jackin-remote
Workspace skills mount	Yes — `add-skills-from` mounts to provider skill dirs	Gap — see Workspace skills mount
Credential source plurality	Backends such as OS secret stores, env, and command	Partial — `op://` and `${env.VAR}` shipped; see Credential source pattern
GitHub CLI auth passthrough	Read-only mount / token config	Partial — see GitHub CLI authentication strategy
Operator console / TUI	Yes — ratatui, full-time	Have — `jackin console` (less feature-rich than CLI on purpose)
Role repo contract	Implicit via skill/config mounts	Have — `jackin.role.toml` + `Dockerfile`
Sensitive mount warnings	No	Have

Where multicode genuinely shines

Live observability of the agent. multicode derives status and resource columns from runtime state and displays them as the main operator surface.
Per-workspace persistent SQLite. A small store underpins GitHub status, custom links, and telemetry.
The tag protocol. <multicode:issue>, <multicode:pr>, and <multicode:repo> turn agent output into structured operator state without making the agent runtime itself part of the orchestrator.
Resource limits as config. Memory, CPU, and file descriptor limits are declared near isolation config instead of being hidden in launch scripts.
Operator extension points. Editor launchers, review tools, and custom hotkeys make the TUI an operator surface rather than a fixed dashboard.

Where jackin is materially ahead — and should stay ahead

Cross-platform Docker substrate instead of Linux-only bwrap.
Per-instance multi-runtime instead of one provider assumption.
Role repos as distribution units instead of ad-hoc skill/config mounts.
CLI-first with jackin console as the simplified front instead of a TUI-only product.
Toolchain-neutral orchestration instead of Micronaut/Java defaults baked into the orchestrator.
Security boundary honesty. multicode’s README is explicit that its isolation is for safety and convenience, not security. Jackin’s docs should preserve that kind of bluntness for every backend: dind is useful, but it is not a microVM; microVM is stronger, but still has workspace and credential policy caveats.

Track B — Containment and recovery comparison

This track folds Hazmat and Docker Sandboxes into the roadmap without making jackin pretend to be either one.

Concern	Hazmat	Docker Sandboxes	jackin’ direction
Boundary explanation	Session contract plus `hazmat explain`	Security model docs and sandbox policy output	Session contract and explain mode
Strong local isolation	macOS user + seatbelt + `pf`; VM tier for hardest cases	Per-sandbox microVM	Selectable sandbox backends
Docker workflows	Private-daemon tier only; shared daemon rejected in containment	Private Docker Engine inside VM	Keep DinD explicit, add microVM/private-daemon backend, reject silent host socket exposure
Network policy	`pf` plus DNS blocklist; exact-domain caveats documented	Host-side proxy, deny-by-default, non-HTTP blocked	Network egress policy
Service ports	Service access appears in session contract	`sbx ports` publishes host-to-sandbox traffic; host services use policy-approved alias	Add service access and port mappings to session contract + network policy
Credential delivery	Host-owned secret store, materialized/brokered per harness	Host proxy injects credentials; values stay outside VM	Container credential exposure and host bridge
Stack ergonomics	Integration manifests with strict “cannot widen trust” rules	Templates/kits and agent-specific setup	Stack integration contracts plus role repos
Recovery	Pre-session snapshots, restore, formal backup invariants	Persistent VM state; `sbx rm` cleanup	Session snapshot and rollback plus disk/state budgets
Parallel Git work	Not the main product focus	Direct mode plus `.sbx/` branch worktrees	Per-mount isolation with jackin-owned worktree/clone modes
Proof / governance	TLA+ for setup, policy, backup, launch invariants	Product security docs	ADRs and behavioral specs for host-side effects

Track A phases — Fleet operations

These phases preserve the fleet-operations work that started from multicode research. They can be implemented independently of most containment work, but they benefit from the same session contract and persistent storage decisions.

Phase 1 — Foundation gaps

Item	Inspiration in multicode	Depends on
Workspace description	Inline-editable note in TUI overview	—
Operator handler system	`[handler]` block (review/web)	—
Declarative resource limits	`[isolation]` `memory-high`/`memory-max`/`cpu`	—
Ephemeral mount modes	`[isolation] isolated = [...]` and `tmpfs = [...]`	—

Phase 2 — Live operator surface

Item	Inspiration in multicode	Depends on
Agent runtime status	Runtime-derived session status	Multi-runtime adapter seam
Console agent session control	TUI as the active workspace/session control plane	Unique container identity; agent runtime status
Console resource panel	CPU/RAM/disk polling	Resource limits (Phase 1)
Agent tag protocol	`<multicode:*>` skill-driven tags	Agent runtime status
GitHub link tracking	GitHub polling + SQLite cache	Tag protocol; persistent storage (Phase 3)
Custom operator tools	`[[tool]]` array (hotkey + exec/prompt)	Operator handler system (Phase 1)

Phase 3 — Persistence and telemetry

Item	Inspiration in multicode	Depends on
Persistent storage layer	Per-workspace SQLite	—
Token & cost telemetry	Usage aggregation	Persistent storage; agent runtime status

Phase 4 — Fleet operations

Item	Inspiration in multicode	Depends on
Task source abstraction	Issue/task queue substrate	—
Autonomous task queue	Parallel issue scanning and dispatch	Task source; persistent storage; agent runtime status
Idle runtime cleanup	Runtime cleanup toggle	Agent runtime status

Phase 5 — Distributed operation and extensibility

Item	Inspiration in multicode	Depends on
jackin-remote	SSH bridge + rsync	All Phase 1-3; multi-runtime
Credential source pattern	Env/command/keychain token backends	— (cross-cutting refactor)
Workspace skills mount	Provider-skill bind mounts	Multi-runtime

Track B phases — Containment, Docker, and recovery

This is the Hazmat/Docker Sandboxes research track. It is security-shaped, but still operator-product work: the operator must understand what is being launched, what can be reached, and how to recover.

Phase B1 — Explain the boundary

Item	Primary inspiration	Depends on
Session contract and explain mode	Hazmat session contract / `hazmat explain`; Docker Sandboxes security model and `sbx ports`/lifecycle output	Workspace resolution; auth strategy
Stack integration contracts	Hazmat session integrations	Session contract

Phase B2 — Control egress and credentials

Item	Primary inspiration	Depends on
Network egress policy	Docker Sandboxes host proxy and network panel; Hazmat `pf`/DNS hardening; Claude devcontainer allowlist	Session contract; selectable sandbox backends
Container credential exposure	Docker Sandboxes credential proxy; Hazmat secret store	Host bridge; credential source pattern
Host bridge	Brokered host capabilities	jackin daemon

Phase B3 — Make Docker mode choices explicit

Item	Primary inspiration	Depends on
Selectable sandbox backends	Docker Sandboxes microVM + private daemon; Hazmat Tier 3 decision	Runtime backend refactor
Rootless DinD	Harden current Docker substrate before/alongside microVMs	DinD TLS; runtime backend refactor
Devcontainer parity	Claude/devcontainer firewall and reproducibility pattern	Network egress policy

Phase B4 — Recover and govern host-side effects

Item	Primary inspiration	Depends on
Session snapshot and rollback	Hazmat pre-session Kopia snapshots and restore	Persistent storage; stack integration contracts
Architecture decision records	Hazmat design assumptions and verification boundary	Codebase health track
Behavioral spec: runtime/launch	Hazmat TLA+ setup/launch ordering discipline	Codebase health track

Execution order

This is a guide, not a constraint. The tracks are intentionally separable.

Fleet quick wins: workspace description, handler system, resource limits, ephemeral mounts, and the lightweight parts of service visibility can land as independent PRs.
Session contract first for containment work: do not build network, snapshot, port-forwarding, or integration behavior until jackin can preview the fully resolved launch boundary and host-side effects.
Persistent storage before deep observability: per-instance SQLite should land before GitHub link tracking, token telemetry, snapshot metadata, and autonomous queue persistence.
Live status before queueing: agent runtime status is the substrate for resource panels, tag protocol, idle cleanup, autonomous queues, and attention prompts.
Network and credential proxy work belongs with backends: dind, microVM, SSH remote, and Kubernetes will not have identical enforcement points. The session contract must say which one is active.
Do not claim Docker Sandboxes parity early: jackin can improve DinD and add microVMs before it has host-side network proxying and credential injection. The docs must keep that distinction visible.
Add tier recommendations before adding more backends: Hazmat’s decision flow is useful because it tells users when not to use a mode. Jackin should make dind/microVM/remote/Kubernetes recommendations visible before the backend matrix grows.

Key structural insights

A launch contract is the common substrate. Hazmat proves that a visible session contract turns security posture into product UX. In jackin, the same contract also unlocks safer stack integrations, network policies, snapshot previews, and backend comparisons.
Live status and persistence still drive fleet work. The multicode track remains valid: status, SQLite, and tags are prerequisites for queues, telemetry, link tracking, and remote operation.
Docker is not a binary; it is a privilege boundary. Hazmat’s shared daemon refusal and Docker Sandboxes’ private-daemon model both point to the same rule: host Docker socket access should never be an accidental convenience path.
Role repos and integration hints solve different problems. Roles define the agent environment. Integrations can make local stacks easier without becoming executable project policy or credential delivery.
Backend-neutral does not mean risk-neutral. A Docker container, rootless DinD, OrbStack isolated machine, Docker Sandboxes microVM, SSH remote host, and Kubernetes pod all need one user-facing abstraction, but their risk profiles must be printed honestly.
Service access is part of the boundary. Ports, host aliases, and local service reachability are not small UX extras. They decide whether an agent can hit databases, dev servers, model runners, or cloud emulators, so they belong in the same contract as mounts and credentials.

Open program-level questions

Session contract surface. Should jackin explain be its own command, a --dry-run --explain mode on load, or both? Recommended default: both, with jackin explain optimized for scripts and docs.
Network policy baseline. Should the default be open networking with a contract warning, or a minimal allowlist for known agent providers plus package registries? Recommended default: open in dind V1, explicit allowlist for future microVM backends that can enforce it outside the guest.
Stack integration ownership. Are integration hints global, role-owned, workspace-owned, or repo-recommended with operator approval? Recommended default: global built-ins plus repo-recommended names that require hash-based operator approval.
Task source identity. Is a task source workspace-bound, agent-bound, or operator-global? Recommended default: workspace-bound, since parallelism limits naturally scope to one workspace’s resources.
Credential proxy destination. Is the host bridge the long-term place for credential proxying, or should sandbox backends own it? Recommended default: host bridge owns operator approval and audit; backends own transport.
Service access model. Should port publication be workspace config, per-session command, console action, or all three? Recommended default: command/console action first, config only for stable dev servers; always show active mappings in the contract/status surface.
Persistence budget. Should jackin show per-instance disk usage and cleanup recommendations before adding microVM/private-daemon backends? Recommended default: yes, because Docker Sandboxes-style persistence is useful only if operators can see and reclaim it.

Out of scope for this program

Shipping a macOS-only clone of Hazmat. Native macOS containment can inform a future backend, but jackin’s baseline must stay cross-platform.
Replacing role repos with integration manifests. Integrations are narrow ergonomics overlays; roles remain the runtime distribution model.
Claiming Docker Sandboxes-equivalent security for dind, rootless DinD, or first-phase microVM work before network and credential proxy gaps are closed.
Implementing Kubernetes support in this program. The containment and contract work should make Kubernetes easier later, but the platform item stays on the main roadmap.

Source materials

Research snapshot: May 9, 2026.

multicode

graemerocher/multicode — active reference implementation for this research. It covers workspace isolation, GitHub tag/status integration, authentication, multicode-remote, Codex provider support, editor-tool selection, autonomous queue / PR actions, and Apple-container experiments.

Hazmat

dredozubov/hazmat — README-level summary of dedicated user isolation, seatbelt, firewall, DNS blocklist, snapshots, harnesses, integrations, and limitations.
Hazmat overview — tier decision flow and the “Docker changes the boundary” rule.
Hazmat harnesses — supported agent CLIs and credential storage/delivery matrix.
Hazmat integrations — strict integration capability rules and repo-recommended integration flow.
Hazmat Docker Sandboxes tier — private-daemon Docker path, devcontainer alternatives, and Compose hardening.
Hazmat shared-daemon projects — why host Docker socket access is treated as a containment escape.
Hazmat threat matrix — risk-by-risk tier comparison.
Hazmat verified scope — formal verification boundaries and setup/rollback findings.

Docker Sandboxes

Docker Sandboxes usage — branch mode, .sbx/ worktrees, lifecycle, and signed-commit notes.
Docker Sandboxes security model — hypervisor, network, Docker Engine, and credential isolation layers.

Local jackin references

Selectable sandbox backends — deep Docker Sandboxes, OrbStack, and libkrun/smolvm comparison already captured in the jackin roadmap.
Per-mount isolation — worktree/clone design and Docker Sandboxes branch-mode comparison.
Container credential exposure — current credential exposure model and proxy/bridge trajectory.