Skip to content

Agent Orchestrator Research Program

Status: Open — design program (fleet-operations track plus containment/recovery track)

Make jackin’ the canonical, terminal-first orchestrator for engineers who run autonomous coding agents in real working environments: local terminals, SSH sessions, disposable servers, Kubernetes debug pods, and long-running isolated workspaces. The product target is not a consumer desktop app. It is the operator surface for experienced engineers who want agents at full speed but do not want that speed pointed directly at their host account, production credentials, or shared Docker daemon.

This page is the research index for adjacent tools that are solving parts of the same problem:

  • multicode is the strongest reference for fleet operations: many parallel workspaces, live status, GitHub link state, persistent task state, resource telemetry, remote operation, and custom operator tools.
  • Hazmat is the strongest reference for local containment discipline: explicit session contracts, tiered threat decisions, macOS user isolation, seatbelt policies, pf firewalling, credential-deny rules, stack integrations that cannot widen trust, and rollback-oriented design.
  • Docker Sandboxes is the strongest commercial benchmark for microVM sandboxing: per-sandbox VM boundary, private Docker daemon, scoped workspace sharing, host-side network policy, and credential proxying.
  • Conductor, Claude devcontainers, Trail of Bits’ devcontainer, and private internal tools remain useful comparison points, but they are not the center of this program.

The program keeps two ideas separate:

  1. Fleet operations: how the operator coordinates many agents and tasks.
  2. Containment and recovery: what an autonomous agent can reach, how that boundary is explained, and how the operator recovers when the agent does something destructive.

That wider scope is why this page lives at agent-orchestrator-research, not under a multicode-specific route.

Every borrowed idea has to survive these filters:

ValueWhat it means for roadmap decisions
Terminal-firstThe CLI and jackin console are primary surfaces. Desktop-only workflows are comparison material, not the center.
Isolation before convenienceNo hidden host mutation, host socket exposure, or credential widening just to make an agent feel seamless.
Runtime-neutral agentsClaude, Codex, Amp, OpenCode, Gemini, and future runtimes should plug into the same operator model.
Role repos over ad-hoc setupToolchains belong in roles where possible. Project-local hints may improve ergonomics, but they must not become policy escapes.
Explicit contractsBefore launch, the operator should be able to see mounts, credentials, network, Docker access, host-side effects, and recovery posture.
Real engineering environmentsThe model must eventually work for SSH, servers, Kubernetes, and repos with serious Docker/Compose needs.
ToolStrongest idea to borrowMain reason not to copy it directly
graemerocher/multicodeParallel workspace table, GitHub status links, multicode-remote, skill-driven tags, Codex provider support, editor launcher, autonomous queue config, Apple-container experimentsLinux-first bwrap/systemd-run core; isolation is convenience, not a strong boundary; some Apple-container workflows still mount the host Docker socket
dredozubov/hazmatSession contract, tier decision flow, native macOS containment, strict integration rules, rollback/proof disciplinemacOS-only; one hardcoded agent user; no smooth multi-agent fleet surface
Docker SandboxesMicroVM plus private Docker daemon, host-side network proxy, credential injection outside the VM, branch worktrees under .sbx/Product is tied to Docker Desktop; not an extensible jackin role ecosystem; credentials/network proxy are not available primitives in plain Docker
Claude/devcontainer patternsDefault-deny firewall and reproducible container setupDevcontainers are workspace setup, not a multi-agent orchestrator or full operator platform
Conductor-style native worktreesLow-friction host-native worktreesNo meaningful sandbox boundary; useful only as a UX comparison

This table is the “why would an operator care?” pass. A feature only belongs on the roadmap when it improves an engineer’s day-to-day control loop, not just because another tool has it.

User-visible benefitSeen inJackin direction
Know the exact boundary before launchHazmat session contract; Docker Sandboxes security modelSession contract and explain mode becomes the common preflight for mounts, auth, Docker, network, ports, persistence, and recovery
Pick the right containment tier for the jobHazmat tier decision flow; Docker Sandboxes private-daemon pathAdd a tier recommendation to jackin explain rather than forcing users to infer whether dind, rootless DinD, microVM, SSH remote, or Kubernetes is appropriate
Run Docker/Compose without trusting the host daemonDocker Sandboxes private Docker Engine; Hazmat Tier 3Keep host socket mounting out of scope; make private-daemon backends the Docker-capable path under selectable sandbox backends
See and govern outbound network behaviorDocker Sandboxes dashboard network panel; Hazmat deny-mode routingNetwork egress policy should include connection logs and rule-editing UX, not only static allowlist config
Open services started by the agentDocker Sandboxes sbx ports; multicode host-localhost notesTrack service exposure as a first-class session-contract section with explicit host-side effects and non-persistent port mappings
Work in parallel without clobbering the main checkoutDocker Sandboxes --branch; multicode short-lived workspacesJackin already has per-mount worktree/clone isolation; improve the operator surface around branch naming, preserved worktrees, and post-session review
Resume a configured environment without rebuilding everythingDocker Sandboxes named/persistent VMs; multicode workspacesPreserve jackin’s explicit state model, but add disk/resource visibility and cleanup policy so persistence does not become invisible bloat
See which agents are idle, busy, waiting, or expensivemulticode live TUI; Docker Sandboxes dashboard cardsAgent runtime status, console resource panel, and token telemetry should converge into one operator overview
Jump from agent output to issue/PR/repo statemulticode <multicode:*> tagsKeep the tag protocol vendor-neutral and optional, then use it to drive GitHub link tracking and custom operator actions
Reuse stack setup safelyHazmat integrations; Docker Sandboxes kits/templatesKeep roles as the environment unit, but allow non-executable stack hints and future reviewed setup kits where they cannot widen trust
Roll back after a bad unattended runHazmat snapshots/restore; Docker Sandboxes disposable VM cleanupSession snapshot and rollback should separate metadata recovery, project snapshots, and sandbox-state deletion
Run the same operator model over SSH/server/Kubernetesmulticode-remote; Jackin roadmap visionjackin-remote and Kubernetes support should reuse the same contract/status/policy vocabulary instead of becoming separate products

Some adjacent-tool features are attractive but would cut against jackin’s values if copied directly.

IdeaWhy not copy directlySafer jackin alternative
Host Docker socket passthroughAny agent with daemon access can create privileged containers or host bind mounts; this collapses the sandbox boundaryPrivate daemon only: DinD, rootless DinD, microVM-owned Docker, or Kubernetes-controlled pods
Same-absolute-path workspace passthrough everywhereElegant in microVMs, but it would break jackin’s explicit dst mount model in Docker/container backendsKeep dst explicit; only use same-path passthrough inside a backend that natively needs and explains it
TUI-only orchestrationFast for one product, but jackin’s core users automate from terminals, SSH, and scriptsCLI-first contracts and commands, with jackin console as the common day-to-day overview
Repo-controlled integration manifests with arbitrary paths/hooksTurns untrusted project files into policy authorityRepo may recommend known integration names; operator approval and jackin-owned manifests decide what activates
Broad environment inheritance for convenienceSecret-shaped env vars and auth sockets are easy to leak into agentsNamed credential sources, safe env selectors, and bridge/proxy-mediated capabilities
”Sandbox equals safe” messagingDocker Sandboxes and Hazmat both document residual workspace, hooks, network, and persistence riskPrint backend-specific risk posture and recovery limits in the session contract

Hazmat is not a jackin replacement. It is a containment-first launcher that treats “what can the agent reach?” as the product question. That makes it a high-signal reference for jackin’s security and trust surface.

  1. The session contract is first-class. A launch is not just “agent started”; it prints the selected mode, read-write project, read-only extensions, service access, snapshot state, and integration-derived behavior. The operator can also preview with hazmat explain.
  2. Docker is treated as a boundary change. Hazmat does not punch a hole from native containment into the host Docker daemon. Private-daemon Docker workflows move to a Docker Sandbox/microVM tier; shared-daemon workflows are either code-only or pushed to a full-VM answer.
  3. Stack integrations are constrained. Integrations can add read-only toolchain/cache paths, snapshot excludes, safe env selectors, warnings, and command hints. They cannot widen write scope, inject credentials, change network policy, or execute arbitrary hooks.
  4. Credential delivery is modeled as capability delivery. Credentials live in a host-owned store and are materialized or brokered only for the selected harness/session. Hazmat is explicit about residual MCP/env inheritance risk.
  5. Recovery and proof boundaries are honest. The TLA+ verification page names exactly which setup, seatbelt, backup/restore, and launch invariants are governed. The proof found ordering bugs, which is exactly the kind of failure mode jackin should care about.
  • It is macOS-native and intentionally platform-specific. Jackin needs the same operator concepts across macOS, Linux, WSL, servers, and eventually Kubernetes.
  • It is single-operator/single-agent-user shaped. Jackin’s product value is many roles, many agents, many concurrent instances, and per-instance state.
  • It does not provide a fleet operations plane comparable to multicode’s live table, task queue, GitHub polling, or remote bridge.
  • Its strongest no-VM path depends on macOS user isolation and seatbelt. That is useful inspiration, not a portable backend.

Hazmat ideas worth turning into jackin roadmap items

Section titled “Hazmat ideas worth turning into jackin roadmap items”
CandidateJackin-shaped versionRoadmap item
Session contract / explainPreview a fully resolved launch boundary before side effects; print the same contract at launchSession contract and explain mode
Stack integrationsOptional, non-executable workspace hints for read-only toolchain/cache mounts, safe env selectors, warnings, and excludesStack integration contracts
Docker tier decisionTreat host Docker daemon access as a policy boundary, not a convenience defaultSelectable sandbox backends, network egress policy
Snapshot/rollbackPre-session snapshots for dirty/non-git/long-autonomy work, with opt-in restore and visible host-side effectsSession snapshot and rollback
Proof/governance boundaryModel launch/finalization ordering where host-side effects become complexArchitecture decision records, behavioral runtime/launch spec

Docker Sandboxes is the benchmark because it solves several hard problems at the same time: VM boundary, private Docker daemon, scoped workspace, network policy, and credential injection. Jackin should use it as a comparison bar, not as an assumption that every backend can match immediately.

  1. Private Docker is the default inside the boundary. The agent can run Docker without touching the host daemon.
  2. Network is host-mediated. HTTP/HTTPS traffic goes through a host proxy, non-HTTP protocols are blocked, and policy is domain-based.
  3. Credentials do not enter the VM. The host proxy injects auth headers. That is materially stronger than env vars or mounted secret files.
  4. Branch mode is operator-friendly. --branch creates worktrees under .sbx/, keeping agent changes out of the main working tree while Git still works inside the sandbox.
  5. The dashboard makes state legible. sbx shows live sandbox status, CPU/RAM use, port mappings, and network governance rules. The important borrow is not the card UI; it is one operator overview for runtime, ports, policy, and cleanup.
  6. Ports and host services are explicit. Host-to-sandbox services require published port mappings, and sandbox-to-host service access goes through a named host alias plus policy. Jackin needs equivalent explicitness for dev servers, databases, and local model runners.
  7. The security docs are blunt about remaining risk. Workspace changes are live on the host in direct mode; hooks and generated files still matter.
  • Jackin should keep role repositories as the unit of runtime distribution. Docker Sandboxes templates are useful, but they are not a replacement for jackin roles.
  • Jackin should preserve mount destination control. Docker Sandboxes’ same-absolute-path passthrough is elegant for worktrees, but jackin’s dst-based workspace model is more explicit and portable.
  • Jackin should treat Docker Sandboxes’ credential proxy as a long-term target, not a first-phase promise. The current container credential work already documents why env injection is weaker.
  • Jackin should make backend differences visible. A dind session, an OrbStack isolated-machine session, and a future Docker Sandboxes-style microVM session should not pretend to have identical risk.

Track A — Fleet operations comparison: jackin’ vs multicode

Section titled “Track A — Fleet operations comparison: jackin’ vs multicode”

This is the feature inventory the fleet track is built from. Have means present in jackin’ today; planned means a roadmap item already exists; gap means truly missing. Items in italics are addressed by this program.

Concernmulticodejackin’ status
Agent isolationbwrap + systemd-run; Apple containerHave — Docker + DinD; planned — selectable backends
Per-agent working treeWorkspace dir; one repo per workspaceHave — worktree and clone isolation
Mount kinds beyond shared/readonlywritable, readable, isolated, tmpfsPartial — shared + readonly; gap on tmpfs and ephemeral, see Ephemeral mount modes
Multi-provider agent runtimeOpenCode + Codex (one per session)Partial — basic built-in runtime launch shipped; parity work remains in multi-runtime
Resource limits per agentmemory-high, memory-max, cpu (cgroups)Gap — see Declarative resource limits
Live agent status (idle/busy/question)Yes — derived from opencode SSE eventsGap — see Agent runtime status
Live machine resource panel (CPU/RAM/disk)Yes — /proc/stat, /proc/meminfo, sampled per 2sGap — see Console resource panel
Per-agent token / cost / OOM trackingYes — usage aggregation serviceGap — see Token & cost telemetry
Agent to operator tag protocolYes — <multicode:repo> / <multicode:issue> / <multicode:pr>Gap — see Agent tag protocol
Live GitHub link state pollingYes — octocrab + per-workspace SQLite cacheGap — see GitHub link tracking
Persistent storage backendPer-workspace SQLite + per-workspace JSONGap — see Persistent storage layer
Per-workspace operator descriptionYes — inline-editable note in TUIGap — see Workspace description
External tool/IDE/diff launcherYes — [handler] block, [[tool]] arrayGap — see Operator handler system
Custom operator-defined toolsYes — [[tool]] array, hotkey + exec/prompt typeGap — see Custom operator tools
Autonomous task queueYes — scan cadence, parallel issue dispatch, runtime cleanup knobsGap — see Autonomous task queue
PR lifecycle actionsYes — publish, rebase, fix CI, address review, request review, and merge actionsGap — model manual gates and PR readiness inside Autonomous task queue
Task source abstractionNo — current multicode hardcodes GitHub issue/repo workflowGap — add source-neutral task policy in Task source abstraction
Idle runtime cleanupYes — gradle --stop, container recycleGap — see Idle runtime cleanup
Remote orchestrationYes — multicode-remote SSH bridge + rsyncGap — see jackin-remote
Workspace skills mountYes — add-skills-from mounts to provider skill dirsGap — see Workspace skills mount
Credential source pluralityBackends such as OS secret stores, env, and commandPartial — op:// and ${env.VAR} shipped; see Credential source pattern
GitHub CLI auth passthroughRead-only mount / token configPartial — see GitHub CLI authentication strategy
Operator console / TUIYes — ratatui, full-timeHave — jackin console (less feature-rich than CLI on purpose)
Role repo contractImplicit via skill/config mountsHave — jackin.role.toml + Dockerfile
Sensitive mount warningsNoHave
  1. Live observability of the agent. multicode derives status and resource columns from runtime state and displays them as the main operator surface.
  2. Per-workspace persistent SQLite. A small store underpins GitHub status, custom links, and telemetry.
  3. The tag protocol. <multicode:issue>, <multicode:pr>, and <multicode:repo> turn agent output into structured operator state without making the agent runtime itself part of the orchestrator.
  4. Resource limits as config. Memory, CPU, and file descriptor limits are declared near isolation config instead of being hidden in launch scripts.
  5. Operator extension points. Editor launchers, review tools, and custom hotkeys make the TUI an operator surface rather than a fixed dashboard.

Where jackin is materially ahead — and should stay ahead

Section titled “Where jackin is materially ahead — and should stay ahead”
  • Cross-platform Docker substrate instead of Linux-only bwrap.
  • Per-instance multi-runtime instead of one provider assumption.
  • Role repos as distribution units instead of ad-hoc skill/config mounts.
  • CLI-first with jackin console as the simplified front instead of a TUI-only product.
  • Toolchain-neutral orchestration instead of Micronaut/Java defaults baked into the orchestrator.
  • Security boundary honesty. multicode’s README is explicit that its isolation is for safety and convenience, not security. Jackin’s docs should preserve that kind of bluntness for every backend: dind is useful, but it is not a microVM; microVM is stronger, but still has workspace and credential policy caveats.

Track B — Containment and recovery comparison

Section titled “Track B — Containment and recovery comparison”

This track folds Hazmat and Docker Sandboxes into the roadmap without making jackin pretend to be either one.

ConcernHazmatDocker Sandboxesjackin’ direction
Boundary explanationSession contract plus hazmat explainSecurity model docs and sandbox policy outputSession contract and explain mode
Strong local isolationmacOS user + seatbelt + pf; VM tier for hardest casesPer-sandbox microVMSelectable sandbox backends
Docker workflowsPrivate-daemon tier only; shared daemon rejected in containmentPrivate Docker Engine inside VMKeep DinD explicit, add microVM/private-daemon backend, reject silent host socket exposure
Network policypf plus DNS blocklist; exact-domain caveats documentedHost-side proxy, deny-by-default, non-HTTP blockedNetwork egress policy
Service portsService access appears in session contractsbx ports publishes host-to-sandbox traffic; host services use policy-approved aliasAdd service access and port mappings to session contract + network policy
Credential deliveryHost-owned secret store, materialized/brokered per harnessHost proxy injects credentials; values stay outside VMContainer credential exposure and host bridge
Stack ergonomicsIntegration manifests with strict “cannot widen trust” rulesTemplates/kits and agent-specific setupStack integration contracts plus role repos
RecoveryPre-session snapshots, restore, formal backup invariantsPersistent VM state; sbx rm cleanupSession snapshot and rollback plus disk/state budgets
Parallel Git workNot the main product focusDirect mode plus .sbx/ branch worktreesPer-mount isolation with jackin-owned worktree/clone modes
Proof / governanceTLA+ for setup, policy, backup, launch invariantsProduct security docsADRs and behavioral specs for host-side effects

These phases preserve the fleet-operations work that started from multicode research. They can be implemented independently of most containment work, but they benefit from the same session contract and persistent storage decisions.

ItemInspiration in multicodeDepends on
Workspace descriptionInline-editable note in TUI overview
Operator handler system[handler] block (review/web)
Declarative resource limits[isolation] memory-high/memory-max/cpu
Ephemeral mount modes[isolation] isolated = [...] and tmpfs = [...]
ItemInspiration in multicodeDepends on
Agent runtime statusRuntime-derived session statusMulti-runtime adapter seam
Console agent session controlTUI as the active workspace/session control planeUnique container identity; agent runtime status
Console resource panelCPU/RAM/disk pollingResource limits (Phase 1)
Agent tag protocol<multicode:*> skill-driven tagsAgent runtime status
GitHub link trackingGitHub polling + SQLite cacheTag protocol; persistent storage (Phase 3)
Custom operator tools[[tool]] array (hotkey + exec/prompt)Operator handler system (Phase 1)
ItemInspiration in multicodeDepends on
Persistent storage layerPer-workspace SQLite
Token & cost telemetryUsage aggregationPersistent storage; agent runtime status
ItemInspiration in multicodeDepends on
Task source abstractionIssue/task queue substrate
Autonomous task queueParallel issue scanning and dispatchTask source; persistent storage; agent runtime status
Idle runtime cleanupRuntime cleanup toggleAgent runtime status

Phase 5 — Distributed operation and extensibility

Section titled “Phase 5 — Distributed operation and extensibility”
ItemInspiration in multicodeDepends on
jackin-remoteSSH bridge + rsyncAll Phase 1-3; multi-runtime
Credential source patternEnv/command/keychain token backends— (cross-cutting refactor)
Workspace skills mountProvider-skill bind mountsMulti-runtime

Track B phases — Containment, Docker, and recovery

Section titled “Track B phases — Containment, Docker, and recovery”

This is the Hazmat/Docker Sandboxes research track. It is security-shaped, but still operator-product work: the operator must understand what is being launched, what can be reached, and how to recover.

ItemPrimary inspirationDepends on
Session contract and explain modeHazmat session contract / hazmat explain; Docker Sandboxes security model and sbx ports/lifecycle outputWorkspace resolution; auth strategy
Stack integration contractsHazmat session integrationsSession contract

Phase B2 — Control egress and credentials

Section titled “Phase B2 — Control egress and credentials”
ItemPrimary inspirationDepends on
Network egress policyDocker Sandboxes host proxy and network panel; Hazmat pf/DNS hardening; Claude devcontainer allowlistSession contract; selectable sandbox backends
Container credential exposureDocker Sandboxes credential proxy; Hazmat secret storeHost bridge; credential source pattern
Host bridgeBrokered host capabilitiesjackin daemon

Phase B3 — Make Docker mode choices explicit

Section titled “Phase B3 — Make Docker mode choices explicit”
ItemPrimary inspirationDepends on
Selectable sandbox backendsDocker Sandboxes microVM + private daemon; Hazmat Tier 3 decisionRuntime backend refactor
Rootless DinDHarden current Docker substrate before/alongside microVMsDinD TLS; runtime backend refactor
Devcontainer parityClaude/devcontainer firewall and reproducibility patternNetwork egress policy

Phase B4 — Recover and govern host-side effects

Section titled “Phase B4 — Recover and govern host-side effects”
ItemPrimary inspirationDepends on
Session snapshot and rollbackHazmat pre-session Kopia snapshots and restorePersistent storage; stack integration contracts
Architecture decision recordsHazmat design assumptions and verification boundaryCodebase health track
Behavioral spec: runtime/launchHazmat TLA+ setup/launch ordering disciplineCodebase health track

This is a guide, not a constraint. The tracks are intentionally separable.

  1. Fleet quick wins: workspace description, handler system, resource limits, ephemeral mounts, and the lightweight parts of service visibility can land as independent PRs.
  2. Session contract first for containment work: do not build network, snapshot, port-forwarding, or integration behavior until jackin can preview the fully resolved launch boundary and host-side effects.
  3. Persistent storage before deep observability: per-instance SQLite should land before GitHub link tracking, token telemetry, snapshot metadata, and autonomous queue persistence.
  4. Live status before queueing: agent runtime status is the substrate for resource panels, tag protocol, idle cleanup, autonomous queues, and attention prompts.
  5. Network and credential proxy work belongs with backends: dind, microVM, SSH remote, and Kubernetes will not have identical enforcement points. The session contract must say which one is active.
  6. Do not claim Docker Sandboxes parity early: jackin can improve DinD and add microVMs before it has host-side network proxying and credential injection. The docs must keep that distinction visible.
  7. Add tier recommendations before adding more backends: Hazmat’s decision flow is useful because it tells users when not to use a mode. Jackin should make dind/microVM/remote/Kubernetes recommendations visible before the backend matrix grows.
  1. A launch contract is the common substrate. Hazmat proves that a visible session contract turns security posture into product UX. In jackin, the same contract also unlocks safer stack integrations, network policies, snapshot previews, and backend comparisons.
  2. Live status and persistence still drive fleet work. The multicode track remains valid: status, SQLite, and tags are prerequisites for queues, telemetry, link tracking, and remote operation.
  3. Docker is not a binary; it is a privilege boundary. Hazmat’s shared daemon refusal and Docker Sandboxes’ private-daemon model both point to the same rule: host Docker socket access should never be an accidental convenience path.
  4. Role repos and integration hints solve different problems. Roles define the agent environment. Integrations can make local stacks easier without becoming executable project policy or credential delivery.
  5. Backend-neutral does not mean risk-neutral. A Docker container, rootless DinD, OrbStack isolated machine, Docker Sandboxes microVM, SSH remote host, and Kubernetes pod all need one user-facing abstraction, but their risk profiles must be printed honestly.
  6. Service access is part of the boundary. Ports, host aliases, and local service reachability are not small UX extras. They decide whether an agent can hit databases, dev servers, model runners, or cloud emulators, so they belong in the same contract as mounts and credentials.
  1. Session contract surface. Should jackin explain be its own command, a --dry-run --explain mode on load, or both? Recommended default: both, with jackin explain optimized for scripts and docs.
  2. Network policy baseline. Should the default be open networking with a contract warning, or a minimal allowlist for known agent providers plus package registries? Recommended default: open in dind V1, explicit allowlist for future microVM backends that can enforce it outside the guest.
  3. Stack integration ownership. Are integration hints global, role-owned, workspace-owned, or repo-recommended with operator approval? Recommended default: global built-ins plus repo-recommended names that require hash-based operator approval.
  4. Task source identity. Is a task source workspace-bound, agent-bound, or operator-global? Recommended default: workspace-bound, since parallelism limits naturally scope to one workspace’s resources.
  5. Credential proxy destination. Is the host bridge the long-term place for credential proxying, or should sandbox backends own it? Recommended default: host bridge owns operator approval and audit; backends own transport.
  6. Service access model. Should port publication be workspace config, per-session command, console action, or all three? Recommended default: command/console action first, config only for stable dev servers; always show active mappings in the contract/status surface.
  7. Persistence budget. Should jackin show per-instance disk usage and cleanup recommendations before adding microVM/private-daemon backends? Recommended default: yes, because Docker Sandboxes-style persistence is useful only if operators can see and reclaim it.
  • Shipping a macOS-only clone of Hazmat. Native macOS containment can inform a future backend, but jackin’s baseline must stay cross-platform.
  • Replacing role repos with integration manifests. Integrations are narrow ergonomics overlays; roles remain the runtime distribution model.
  • Claiming Docker Sandboxes-equivalent security for dind, rootless DinD, or first-phase microVM work before network and credential proxy gaps are closed.
  • Implementing Kubernetes support in this program. The containment and contract work should make Kubernetes easier later, but the platform item stays on the main roadmap.

Research snapshot: May 9, 2026.

  • graemerocher/multicode — active reference implementation for this research. It covers workspace isolation, GitHub tag/status integration, authentication, multicode-remote, Codex provider support, editor-tool selection, autonomous queue / PR actions, and Apple-container experiments.