Agent Orchestrator Research Program
Status: Open — design program (fleet-operations track plus containment/recovery track)
Make jackin’ the canonical, terminal-first orchestrator for engineers who run autonomous coding agents in real working environments: local terminals, SSH sessions, disposable servers, Kubernetes debug pods, and long-running isolated workspaces. The product target is not a consumer desktop app. It is the operator surface for experienced engineers who want agents at full speed but do not want that speed pointed directly at their host account, production credentials, or shared Docker daemon.
This page is the research index for adjacent tools that are solving parts of the same problem:
multicodeis the strongest reference for fleet operations: many parallel workspaces, live status, GitHub link state, persistent task state, resource telemetry, remote operation, and custom operator tools.- Hazmat is the strongest reference for local containment discipline:
explicit session contracts, tiered threat decisions, macOS user isolation,
seatbelt policies,
pffirewalling, credential-deny rules, stack integrations that cannot widen trust, and rollback-oriented design. - Docker Sandboxes is the strongest commercial benchmark for microVM sandboxing: per-sandbox VM boundary, private Docker daemon, scoped workspace sharing, host-side network policy, and credential proxying.
- Conductor, Claude devcontainers, Trail of Bits’ devcontainer, and private internal tools remain useful comparison points, but they are not the center of this program.
The program keeps two ideas separate:
- Fleet operations: how the operator coordinates many agents and tasks.
- Containment and recovery: what an autonomous agent can reach, how that boundary is explained, and how the operator recovers when the agent does something destructive.
That wider scope is why this page lives at agent-orchestrator-research,
not under a multicode-specific route.
Jackin values used for evaluation
Section titled “Jackin values used for evaluation”Every borrowed idea has to survive these filters:
| Value | What it means for roadmap decisions |
|---|---|
| Terminal-first | The CLI and jackin console are primary surfaces. Desktop-only workflows are comparison material, not the center. |
| Isolation before convenience | No hidden host mutation, host socket exposure, or credential widening just to make an agent feel seamless. |
| Runtime-neutral agents | Claude, Codex, Amp, OpenCode, Gemini, and future runtimes should plug into the same operator model. |
| Role repos over ad-hoc setup | Toolchains belong in roles where possible. Project-local hints may improve ergonomics, but they must not become policy escapes. |
| Explicit contracts | Before launch, the operator should be able to see mounts, credentials, network, Docker access, host-side effects, and recovery posture. |
| Real engineering environments | The model must eventually work for SSH, servers, Kubernetes, and repos with serious Docker/Compose needs. |
Research map
Section titled “Research map”| Tool | Strongest idea to borrow | Main reason not to copy it directly |
|---|---|---|
graemerocher/multicode | Parallel workspace table, GitHub status links, multicode-remote, skill-driven tags, Codex provider support, editor launcher, autonomous queue config, Apple-container experiments | Linux-first bwrap/systemd-run core; isolation is convenience, not a strong boundary; some Apple-container workflows still mount the host Docker socket |
dredozubov/hazmat | Session contract, tier decision flow, native macOS containment, strict integration rules, rollback/proof discipline | macOS-only; one hardcoded agent user; no smooth multi-agent fleet surface |
| Docker Sandboxes | MicroVM plus private Docker daemon, host-side network proxy, credential injection outside the VM, branch worktrees under .sbx/ | Product is tied to Docker Desktop; not an extensible jackin role ecosystem; credentials/network proxy are not available primitives in plain Docker |
| Claude/devcontainer patterns | Default-deny firewall and reproducible container setup | Devcontainers are workspace setup, not a multi-agent orchestrator or full operator platform |
| Conductor-style native worktrees | Low-friction host-native worktrees | No meaningful sandbox boundary; useful only as a UX comparison |
User-facing benefit matrix
Section titled “User-facing benefit matrix”This table is the “why would an operator care?” pass. A feature only belongs on the roadmap when it improves an engineer’s day-to-day control loop, not just because another tool has it.
| User-visible benefit | Seen in | Jackin direction |
|---|---|---|
| Know the exact boundary before launch | Hazmat session contract; Docker Sandboxes security model | Session contract and explain mode becomes the common preflight for mounts, auth, Docker, network, ports, persistence, and recovery |
| Pick the right containment tier for the job | Hazmat tier decision flow; Docker Sandboxes private-daemon path | Add a tier recommendation to jackin explain rather than forcing users to infer whether dind, rootless DinD, microVM, SSH remote, or Kubernetes is appropriate |
| Run Docker/Compose without trusting the host daemon | Docker Sandboxes private Docker Engine; Hazmat Tier 3 | Keep host socket mounting out of scope; make private-daemon backends the Docker-capable path under selectable sandbox backends |
| See and govern outbound network behavior | Docker Sandboxes dashboard network panel; Hazmat deny-mode routing | Network egress policy should include connection logs and rule-editing UX, not only static allowlist config |
| Open services started by the agent | Docker Sandboxes sbx ports; multicode host-localhost notes | Track service exposure as a first-class session-contract section with explicit host-side effects and non-persistent port mappings |
| Work in parallel without clobbering the main checkout | Docker Sandboxes --branch; multicode short-lived workspaces | Jackin already has per-mount worktree/clone isolation; improve the operator surface around branch naming, preserved worktrees, and post-session review |
| Resume a configured environment without rebuilding everything | Docker Sandboxes named/persistent VMs; multicode workspaces | Preserve jackin’s explicit state model, but add disk/resource visibility and cleanup policy so persistence does not become invisible bloat |
| See which agents are idle, busy, waiting, or expensive | multicode live TUI; Docker Sandboxes dashboard cards | Agent runtime status, console resource panel, and token telemetry should converge into one operator overview |
| Jump from agent output to issue/PR/repo state | multicode <multicode:*> tags | Keep the tag protocol vendor-neutral and optional, then use it to drive GitHub link tracking and custom operator actions |
| Reuse stack setup safely | Hazmat integrations; Docker Sandboxes kits/templates | Keep roles as the environment unit, but allow non-executable stack hints and future reviewed setup kits where they cannot widen trust |
| Roll back after a bad unattended run | Hazmat snapshots/restore; Docker Sandboxes disposable VM cleanup | Session snapshot and rollback should separate metadata recovery, project snapshots, and sandbox-state deletion |
| Run the same operator model over SSH/server/Kubernetes | multicode-remote; Jackin roadmap vision | jackin-remote and Kubernetes support should reuse the same contract/status/policy vocabulary instead of becoming separate products |
Ideas to decline or postpone
Section titled “Ideas to decline or postpone”Some adjacent-tool features are attractive but would cut against jackin’s values if copied directly.
| Idea | Why not copy directly | Safer jackin alternative |
|---|---|---|
| Host Docker socket passthrough | Any agent with daemon access can create privileged containers or host bind mounts; this collapses the sandbox boundary | Private daemon only: DinD, rootless DinD, microVM-owned Docker, or Kubernetes-controlled pods |
| Same-absolute-path workspace passthrough everywhere | Elegant in microVMs, but it would break jackin’s explicit dst mount model in Docker/container backends | Keep dst explicit; only use same-path passthrough inside a backend that natively needs and explains it |
| TUI-only orchestration | Fast for one product, but jackin’s core users automate from terminals, SSH, and scripts | CLI-first contracts and commands, with jackin console as the common day-to-day overview |
| Repo-controlled integration manifests with arbitrary paths/hooks | Turns untrusted project files into policy authority | Repo may recommend known integration names; operator approval and jackin-owned manifests decide what activates |
| Broad environment inheritance for convenience | Secret-shaped env vars and auth sockets are easy to leak into agents | Named credential sources, safe env selectors, and bridge/proxy-mediated capabilities |
| ”Sandbox equals safe” messaging | Docker Sandboxes and Hazmat both document residual workspace, hooks, network, and persistence risk | Print backend-specific risk posture and recovery limits in the session contract |
Critical read of Hazmat
Section titled “Critical read of Hazmat”Hazmat is not a jackin replacement. It is a containment-first launcher that treats “what can the agent reach?” as the product question. That makes it a high-signal reference for jackin’s security and trust surface.
What Hazmat gets right
Section titled “What Hazmat gets right”- The session contract is first-class. A launch is not just “agent
started”; it prints the selected mode, read-write project, read-only
extensions, service access, snapshot state, and integration-derived
behavior. The operator can also preview with
hazmat explain. - Docker is treated as a boundary change. Hazmat does not punch a hole from native containment into the host Docker daemon. Private-daemon Docker workflows move to a Docker Sandbox/microVM tier; shared-daemon workflows are either code-only or pushed to a full-VM answer.
- Stack integrations are constrained. Integrations can add read-only toolchain/cache paths, snapshot excludes, safe env selectors, warnings, and command hints. They cannot widen write scope, inject credentials, change network policy, or execute arbitrary hooks.
- Credential delivery is modeled as capability delivery. Credentials live in a host-owned store and are materialized or brokered only for the selected harness/session. Hazmat is explicit about residual MCP/env inheritance risk.
- Recovery and proof boundaries are honest. The TLA+ verification page names exactly which setup, seatbelt, backup/restore, and launch invariants are governed. The proof found ordering bugs, which is exactly the kind of failure mode jackin should care about.
Where Hazmat does not match jackin
Section titled “Where Hazmat does not match jackin”- It is macOS-native and intentionally platform-specific. Jackin needs the same operator concepts across macOS, Linux, WSL, servers, and eventually Kubernetes.
- It is single-operator/single-agent-user shaped. Jackin’s product value is many roles, many agents, many concurrent instances, and per-instance state.
- It does not provide a fleet operations plane comparable to multicode’s live table, task queue, GitHub polling, or remote bridge.
- Its strongest no-VM path depends on macOS user isolation and seatbelt. That is useful inspiration, not a portable backend.
Hazmat ideas worth turning into jackin roadmap items
Section titled “Hazmat ideas worth turning into jackin roadmap items”| Candidate | Jackin-shaped version | Roadmap item |
|---|---|---|
Session contract / explain | Preview a fully resolved launch boundary before side effects; print the same contract at launch | Session contract and explain mode |
| Stack integrations | Optional, non-executable workspace hints for read-only toolchain/cache mounts, safe env selectors, warnings, and excludes | Stack integration contracts |
| Docker tier decision | Treat host Docker daemon access as a policy boundary, not a convenience default | Selectable sandbox backends, network egress policy |
| Snapshot/rollback | Pre-session snapshots for dirty/non-git/long-autonomy work, with opt-in restore and visible host-side effects | Session snapshot and rollback |
| Proof/governance boundary | Model launch/finalization ordering where host-side effects become complex | Architecture decision records, behavioral runtime/launch spec |
Critical read of Docker Sandboxes
Section titled “Critical read of Docker Sandboxes”Docker Sandboxes is the benchmark because it solves several hard problems at the same time: VM boundary, private Docker daemon, scoped workspace, network policy, and credential injection. Jackin should use it as a comparison bar, not as an assumption that every backend can match immediately.
What Docker Sandboxes gets right
Section titled “What Docker Sandboxes gets right”- Private Docker is the default inside the boundary. The agent can run Docker without touching the host daemon.
- Network is host-mediated. HTTP/HTTPS traffic goes through a host proxy, non-HTTP protocols are blocked, and policy is domain-based.
- Credentials do not enter the VM. The host proxy injects auth headers. That is materially stronger than env vars or mounted secret files.
- Branch mode is operator-friendly.
--branchcreates worktrees under.sbx/, keeping agent changes out of the main working tree while Git still works inside the sandbox. - The dashboard makes state legible.
sbxshows live sandbox status, CPU/RAM use, port mappings, and network governance rules. The important borrow is not the card UI; it is one operator overview for runtime, ports, policy, and cleanup. - Ports and host services are explicit. Host-to-sandbox services require published port mappings, and sandbox-to-host service access goes through a named host alias plus policy. Jackin needs equivalent explicitness for dev servers, databases, and local model runners.
- The security docs are blunt about remaining risk. Workspace changes are live on the host in direct mode; hooks and generated files still matter.
Where jackin should differ
Section titled “Where jackin should differ”- Jackin should keep role repositories as the unit of runtime distribution. Docker Sandboxes templates are useful, but they are not a replacement for jackin roles.
- Jackin should preserve mount destination control. Docker Sandboxes’
same-absolute-path passthrough is elegant for worktrees, but jackin’s
dst-based workspace model is more explicit and portable. - Jackin should treat Docker Sandboxes’ credential proxy as a long-term target, not a first-phase promise. The current container credential work already documents why env injection is weaker.
- Jackin should make backend differences visible. A
dindsession, an OrbStack isolated-machine session, and a future Docker Sandboxes-style microVM session should not pretend to have identical risk.
Track A — Fleet operations comparison: jackin’ vs multicode
Section titled “Track A — Fleet operations comparison: jackin’ vs multicode”This is the feature inventory the fleet track is built from. Have means present in jackin’ today; planned means a roadmap item already exists; gap means truly missing. Items in italics are addressed by this program.
| Concern | multicode | jackin’ status |
|---|---|---|
| Agent isolation | bwrap + systemd-run; Apple container | Have — Docker + DinD; planned — selectable backends |
| Per-agent working tree | Workspace dir; one repo per workspace | Have — worktree and clone isolation |
Mount kinds beyond shared/readonly | writable, readable, isolated, tmpfs | Partial — shared + readonly; gap on tmpfs and ephemeral, see Ephemeral mount modes |
| Multi-provider agent runtime | OpenCode + Codex (one per session) | Partial — basic built-in runtime launch shipped; parity work remains in multi-runtime |
| Resource limits per agent | memory-high, memory-max, cpu (cgroups) | Gap — see Declarative resource limits |
| Live agent status (idle/busy/question) | Yes — derived from opencode SSE events | Gap — see Agent runtime status |
| Live machine resource panel (CPU/RAM/disk) | Yes — /proc/stat, /proc/meminfo, sampled per 2s | Gap — see Console resource panel |
| Per-agent token / cost / OOM tracking | Yes — usage aggregation service | Gap — see Token & cost telemetry |
| Agent to operator tag protocol | Yes — <multicode:repo> / <multicode:issue> / <multicode:pr> | Gap — see Agent tag protocol |
| Live GitHub link state polling | Yes — octocrab + per-workspace SQLite cache | Gap — see GitHub link tracking |
| Persistent storage backend | Per-workspace SQLite + per-workspace JSON | Gap — see Persistent storage layer |
| Per-workspace operator description | Yes — inline-editable note in TUI | Gap — see Workspace description |
| External tool/IDE/diff launcher | Yes — [handler] block, [[tool]] array | Gap — see Operator handler system |
| Custom operator-defined tools | Yes — [[tool]] array, hotkey + exec/prompt type | Gap — see Custom operator tools |
| Autonomous task queue | Yes — scan cadence, parallel issue dispatch, runtime cleanup knobs | Gap — see Autonomous task queue |
| PR lifecycle actions | Yes — publish, rebase, fix CI, address review, request review, and merge actions | Gap — model manual gates and PR readiness inside Autonomous task queue |
| Task source abstraction | No — current multicode hardcodes GitHub issue/repo workflow | Gap — add source-neutral task policy in Task source abstraction |
| Idle runtime cleanup | Yes — gradle --stop, container recycle | Gap — see Idle runtime cleanup |
| Remote orchestration | Yes — multicode-remote SSH bridge + rsync | Gap — see jackin-remote |
| Workspace skills mount | Yes — add-skills-from mounts to provider skill dirs | Gap — see Workspace skills mount |
| Credential source plurality | Backends such as OS secret stores, env, and command | Partial — op:// and ${env.VAR} shipped; see Credential source pattern |
| GitHub CLI auth passthrough | Read-only mount / token config | Partial — see GitHub CLI authentication strategy |
| Operator console / TUI | Yes — ratatui, full-time | Have — jackin console (less feature-rich than CLI on purpose) |
| Role repo contract | Implicit via skill/config mounts | Have — jackin.role.toml + Dockerfile |
| Sensitive mount warnings | No | Have |
Where multicode genuinely shines
Section titled “Where multicode genuinely shines”- Live observability of the agent. multicode derives status and resource columns from runtime state and displays them as the main operator surface.
- Per-workspace persistent SQLite. A small store underpins GitHub status, custom links, and telemetry.
- The tag protocol.
<multicode:issue>,<multicode:pr>, and<multicode:repo>turn agent output into structured operator state without making the agent runtime itself part of the orchestrator. - Resource limits as config. Memory, CPU, and file descriptor limits are declared near isolation config instead of being hidden in launch scripts.
- Operator extension points. Editor launchers, review tools, and custom hotkeys make the TUI an operator surface rather than a fixed dashboard.
Where jackin is materially ahead — and should stay ahead
Section titled “Where jackin is materially ahead — and should stay ahead”- Cross-platform Docker substrate instead of Linux-only
bwrap. - Per-instance multi-runtime instead of one provider assumption.
- Role repos as distribution units instead of ad-hoc skill/config mounts.
- CLI-first with
jackin consoleas the simplified front instead of a TUI-only product. - Toolchain-neutral orchestration instead of Micronaut/Java defaults baked into the orchestrator.
- Security boundary honesty. multicode’s README is explicit that its
isolation is for safety and convenience, not security. Jackin’s docs should
preserve that kind of bluntness for every backend:
dindis useful, but it is not a microVM; microVM is stronger, but still has workspace and credential policy caveats.
Track B — Containment and recovery comparison
Section titled “Track B — Containment and recovery comparison”This track folds Hazmat and Docker Sandboxes into the roadmap without making jackin pretend to be either one.
| Concern | Hazmat | Docker Sandboxes | jackin’ direction |
|---|---|---|---|
| Boundary explanation | Session contract plus hazmat explain | Security model docs and sandbox policy output | Session contract and explain mode |
| Strong local isolation | macOS user + seatbelt + pf; VM tier for hardest cases | Per-sandbox microVM | Selectable sandbox backends |
| Docker workflows | Private-daemon tier only; shared daemon rejected in containment | Private Docker Engine inside VM | Keep DinD explicit, add microVM/private-daemon backend, reject silent host socket exposure |
| Network policy | pf plus DNS blocklist; exact-domain caveats documented | Host-side proxy, deny-by-default, non-HTTP blocked | Network egress policy |
| Service ports | Service access appears in session contract | sbx ports publishes host-to-sandbox traffic; host services use policy-approved alias | Add service access and port mappings to session contract + network policy |
| Credential delivery | Host-owned secret store, materialized/brokered per harness | Host proxy injects credentials; values stay outside VM | Container credential exposure and host bridge |
| Stack ergonomics | Integration manifests with strict “cannot widen trust” rules | Templates/kits and agent-specific setup | Stack integration contracts plus role repos |
| Recovery | Pre-session snapshots, restore, formal backup invariants | Persistent VM state; sbx rm cleanup | Session snapshot and rollback plus disk/state budgets |
| Parallel Git work | Not the main product focus | Direct mode plus .sbx/ branch worktrees | Per-mount isolation with jackin-owned worktree/clone modes |
| Proof / governance | TLA+ for setup, policy, backup, launch invariants | Product security docs | ADRs and behavioral specs for host-side effects |
Track A phases — Fleet operations
Section titled “Track A phases — Fleet operations”These phases preserve the fleet-operations work that started from multicode research. They can be implemented independently of most containment work, but they benefit from the same session contract and persistent storage decisions.
Phase 1 — Foundation gaps
Section titled “Phase 1 — Foundation gaps”| Item | Inspiration in multicode | Depends on |
|---|---|---|
| Workspace description | Inline-editable note in TUI overview | — |
| Operator handler system | [handler] block (review/web) | — |
| Declarative resource limits | [isolation] memory-high/memory-max/cpu | — |
| Ephemeral mount modes | [isolation] isolated = [...] and tmpfs = [...] | — |
Phase 2 — Live operator surface
Section titled “Phase 2 — Live operator surface”| Item | Inspiration in multicode | Depends on |
|---|---|---|
| Agent runtime status | Runtime-derived session status | Multi-runtime adapter seam |
| Console agent session control | TUI as the active workspace/session control plane | Unique container identity; agent runtime status |
| Console resource panel | CPU/RAM/disk polling | Resource limits (Phase 1) |
| Agent tag protocol | <multicode:*> skill-driven tags | Agent runtime status |
| GitHub link tracking | GitHub polling + SQLite cache | Tag protocol; persistent storage (Phase 3) |
| Custom operator tools | [[tool]] array (hotkey + exec/prompt) | Operator handler system (Phase 1) |
Phase 3 — Persistence and telemetry
Section titled “Phase 3 — Persistence and telemetry”| Item | Inspiration in multicode | Depends on |
|---|---|---|
| Persistent storage layer | Per-workspace SQLite | — |
| Token & cost telemetry | Usage aggregation | Persistent storage; agent runtime status |
Phase 4 — Fleet operations
Section titled “Phase 4 — Fleet operations”| Item | Inspiration in multicode | Depends on |
|---|---|---|
| Task source abstraction | Issue/task queue substrate | — |
| Autonomous task queue | Parallel issue scanning and dispatch | Task source; persistent storage; agent runtime status |
| Idle runtime cleanup | Runtime cleanup toggle | Agent runtime status |
Phase 5 — Distributed operation and extensibility
Section titled “Phase 5 — Distributed operation and extensibility”| Item | Inspiration in multicode | Depends on |
|---|---|---|
| jackin-remote | SSH bridge + rsync | All Phase 1-3; multi-runtime |
| Credential source pattern | Env/command/keychain token backends | — (cross-cutting refactor) |
| Workspace skills mount | Provider-skill bind mounts | Multi-runtime |
Track B phases — Containment, Docker, and recovery
Section titled “Track B phases — Containment, Docker, and recovery”This is the Hazmat/Docker Sandboxes research track. It is security-shaped, but still operator-product work: the operator must understand what is being launched, what can be reached, and how to recover.
Phase B1 — Explain the boundary
Section titled “Phase B1 — Explain the boundary”| Item | Primary inspiration | Depends on |
|---|---|---|
| Session contract and explain mode | Hazmat session contract / hazmat explain; Docker Sandboxes security model and sbx ports/lifecycle output | Workspace resolution; auth strategy |
| Stack integration contracts | Hazmat session integrations | Session contract |
Phase B2 — Control egress and credentials
Section titled “Phase B2 — Control egress and credentials”| Item | Primary inspiration | Depends on |
|---|---|---|
| Network egress policy | Docker Sandboxes host proxy and network panel; Hazmat pf/DNS hardening; Claude devcontainer allowlist | Session contract; selectable sandbox backends |
| Container credential exposure | Docker Sandboxes credential proxy; Hazmat secret store | Host bridge; credential source pattern |
| Host bridge | Brokered host capabilities | jackin daemon |
Phase B3 — Make Docker mode choices explicit
Section titled “Phase B3 — Make Docker mode choices explicit”| Item | Primary inspiration | Depends on |
|---|---|---|
| Selectable sandbox backends | Docker Sandboxes microVM + private daemon; Hazmat Tier 3 decision | Runtime backend refactor |
| Rootless DinD | Harden current Docker substrate before/alongside microVMs | DinD TLS; runtime backend refactor |
| Devcontainer parity | Claude/devcontainer firewall and reproducibility pattern | Network egress policy |
Phase B4 — Recover and govern host-side effects
Section titled “Phase B4 — Recover and govern host-side effects”| Item | Primary inspiration | Depends on |
|---|---|---|
| Session snapshot and rollback | Hazmat pre-session Kopia snapshots and restore | Persistent storage; stack integration contracts |
| Architecture decision records | Hazmat design assumptions and verification boundary | Codebase health track |
| Behavioral spec: runtime/launch | Hazmat TLA+ setup/launch ordering discipline | Codebase health track |
Execution order
Section titled “Execution order”This is a guide, not a constraint. The tracks are intentionally separable.
- Fleet quick wins: workspace description, handler system, resource limits, ephemeral mounts, and the lightweight parts of service visibility can land as independent PRs.
- Session contract first for containment work: do not build network, snapshot, port-forwarding, or integration behavior until jackin can preview the fully resolved launch boundary and host-side effects.
- Persistent storage before deep observability: per-instance SQLite should land before GitHub link tracking, token telemetry, snapshot metadata, and autonomous queue persistence.
- Live status before queueing: agent runtime status is the substrate for resource panels, tag protocol, idle cleanup, autonomous queues, and attention prompts.
- Network and credential proxy work belongs with backends:
dind, microVM, SSH remote, and Kubernetes will not have identical enforcement points. The session contract must say which one is active. - Do not claim Docker Sandboxes parity early: jackin can improve DinD and add microVMs before it has host-side network proxying and credential injection. The docs must keep that distinction visible.
- Add tier recommendations before adding more backends: Hazmat’s decision
flow is useful because it tells users when not to use a mode. Jackin should
make
dind/microVM/remote/Kubernetes recommendations visible before the backend matrix grows.
Key structural insights
Section titled “Key structural insights”- A launch contract is the common substrate. Hazmat proves that a visible session contract turns security posture into product UX. In jackin, the same contract also unlocks safer stack integrations, network policies, snapshot previews, and backend comparisons.
- Live status and persistence still drive fleet work. The multicode track remains valid: status, SQLite, and tags are prerequisites for queues, telemetry, link tracking, and remote operation.
- Docker is not a binary; it is a privilege boundary. Hazmat’s shared daemon refusal and Docker Sandboxes’ private-daemon model both point to the same rule: host Docker socket access should never be an accidental convenience path.
- Role repos and integration hints solve different problems. Roles define the agent environment. Integrations can make local stacks easier without becoming executable project policy or credential delivery.
- Backend-neutral does not mean risk-neutral. A Docker container, rootless DinD, OrbStack isolated machine, Docker Sandboxes microVM, SSH remote host, and Kubernetes pod all need one user-facing abstraction, but their risk profiles must be printed honestly.
- Service access is part of the boundary. Ports, host aliases, and local service reachability are not small UX extras. They decide whether an agent can hit databases, dev servers, model runners, or cloud emulators, so they belong in the same contract as mounts and credentials.
Open program-level questions
Section titled “Open program-level questions”- Session contract surface. Should
jackin explainbe its own command, a--dry-run --explainmode onload, or both? Recommended default: both, withjackin explainoptimized for scripts and docs. - Network policy baseline. Should the default be open networking with a
contract warning, or a minimal allowlist for known agent providers plus
package registries? Recommended default: open in
dindV1, explicit allowlist for future microVM backends that can enforce it outside the guest. - Stack integration ownership. Are integration hints global, role-owned, workspace-owned, or repo-recommended with operator approval? Recommended default: global built-ins plus repo-recommended names that require hash-based operator approval.
- Task source identity. Is a task source workspace-bound, agent-bound, or operator-global? Recommended default: workspace-bound, since parallelism limits naturally scope to one workspace’s resources.
- Credential proxy destination. Is the host bridge the long-term place for credential proxying, or should sandbox backends own it? Recommended default: host bridge owns operator approval and audit; backends own transport.
- Service access model. Should port publication be workspace config, per-session command, console action, or all three? Recommended default: command/console action first, config only for stable dev servers; always show active mappings in the contract/status surface.
- Persistence budget. Should jackin show per-instance disk usage and cleanup recommendations before adding microVM/private-daemon backends? Recommended default: yes, because Docker Sandboxes-style persistence is useful only if operators can see and reclaim it.
Out of scope for this program
Section titled “Out of scope for this program”- Shipping a macOS-only clone of Hazmat. Native macOS containment can inform a future backend, but jackin’s baseline must stay cross-platform.
- Replacing role repos with integration manifests. Integrations are narrow ergonomics overlays; roles remain the runtime distribution model.
- Claiming Docker Sandboxes-equivalent security for
dind, rootless DinD, or first-phase microVM work before network and credential proxy gaps are closed. - Implementing Kubernetes support in this program. The containment and contract work should make Kubernetes easier later, but the platform item stays on the main roadmap.
See also
Section titled “See also”- Codebase readability & restructuring — structural reference for phased roadmap programs
- Multi-runtime support — primary upstream dependency for runtime-neutral observability
- Per-mount isolation — parallel agent prerequisite and Docker Sandboxes branch-mode comparison
- Selectable sandbox backends — deep Docker Sandboxes and microVM comparison
- Container credential exposure — current credential-in-container risk and proxy trajectory
Source materials
Section titled “Source materials”Research snapshot: May 9, 2026.
multicode
Section titled “multicode”graemerocher/multicode— active reference implementation for this research. It covers workspace isolation, GitHub tag/status integration, authentication,multicode-remote, Codex provider support, editor-tool selection, autonomous queue / PR actions, and Apple-container experiments.
Hazmat
Section titled “Hazmat”dredozubov/hazmat— README-level summary of dedicated user isolation, seatbelt, firewall, DNS blocklist, snapshots, harnesses, integrations, and limitations.- Hazmat overview — tier decision flow and the “Docker changes the boundary” rule.
- Hazmat harnesses — supported agent CLIs and credential storage/delivery matrix.
- Hazmat integrations — strict integration capability rules and repo-recommended integration flow.
- Hazmat Docker Sandboxes tier — private-daemon Docker path, devcontainer alternatives, and Compose hardening.
- Hazmat shared-daemon projects — why host Docker socket access is treated as a containment escape.
- Hazmat threat matrix — risk-by-risk tier comparison.
- Hazmat verified scope — formal verification boundaries and setup/rollback findings.
Docker Sandboxes
Section titled “Docker Sandboxes”- Docker Sandboxes usage —
branch mode,
.sbx/worktrees, lifecycle, and signed-commit notes. - Docker Sandboxes security model — hypervisor, network, Docker Engine, and credential isolation layers.
Local jackin references
Section titled “Local jackin references”- Selectable sandbox backends — deep Docker Sandboxes, OrbStack, and libkrun/smolvm comparison already captured in the jackin roadmap.
- Per-mount isolation — worktree/clone design and Docker Sandboxes branch-mode comparison.
- Container credential exposure — current credential exposure model and proxy/bridge trajectory.