Declarative Resource Limits per Agent
Status: Open — design proposal (Phase 1, Agent Orchestrator Research Program)
Problem
Section titled “Problem”jackin’ runs every agent in a Docker container with whatever resource
allocation the host gives it. On a developer laptop, six parallel agents can
each spawn a cargo build, exhaust memory, OOM-kill the desktop, or starve
each other for CPU. There’s no operator-facing control for this today; the
operator’s only knob is “launch fewer agents.”
Docker exposes the right primitives (--memory, --cpus, --ulimit nofile=N, plus --memory-reservation for soft limits), but jackin’ doesn’t
plumb any of them through.
multicode addresses this directly with three declarative fields:
memory-high (soft limit, triggers reclaim), memory-max (hard limit,
triggers OOM kill), cpu (quota as percentage), and nofile (FD ceiling on
Apple-container backends).
Why It Matters
Section titled “Why It Matters”- Parallel agents are unsafe today on resource-constrained hosts. This is a literal correctness gap: a runaway agent can take down the operator’s whole machine.
- The autonomous queue (Phase 4) is unusable without it. Five queued
agents with default-unlimited memory and CPU is a recipe for OOM kills the
moment two of them happen to be running
cargo buildsimultaneously. - Cross-backend resource translation is the right home for this design.
Docker, Apple
container, and the planned selectable sandbox backends each express limits differently — a declarative layer means each backend translates once.
Inspiration in multicode
Section titled “Inspiration in multicode”Sources:
- README — Isolation
- Config —
config.toml[isolation]block (memory-high,memory-max,cpu)
[isolation]memory-high = "12 GiB" # soft limit; triggers cgroup memory.highmemory-max = "16 GiB" # hard limit; triggers OOM at this pointcpu = "300%" # 3 CPU cores worth of quotanofile = 16384 # FD ceiling (Apple container only)multicode parses these via the size crate (decimal 12 GB and binary
16 GiB both supported), expands shell variables, then maps them onto
systemd-run --property MemoryHigh=... etc. — each backend has its own
translator, but the config surface is uniform.
multicode also tracks runtime metrics that complement the limits: current RAM, CPU %, and crucially OOM kill count (sampled from systemd memory pressure counter). When an agent gets OOM-killed, the operator sees it.
Recommended Shape
Section titled “Recommended Shape”The right level for these fields is the role manifest, not the
operator config or workspace config. Reasoning: limits scale with the
toolchain (a Rust agent with cargo build needs more headroom than a Go
agent), and the role is where toolchain choices live. Operator/
workspace overrides come later if a use case surfaces.
Config
Section titled “Config”version = "v1alpha2"dockerfile = "Dockerfile"
[runtime.limits]memory_high = "12 GiB" # soft (Docker --memory-reservation)memory_max = "16 GiB" # hard (Docker --memory)cpus = "3.0" # Docker --cpus (string for "1.5", "300%")nofile = 16384 # Docker --ulimit nofile=N:N
[runtime.limits.oom]preserve_state = true # don't auto-clean an OOM-killed instancenotify = true # surface OOM in console (depends on Phase 2 status)memory_high is optional; absent means same as memory_max. nofile is
optional and defaults to host. cpus accepts both fractional and percentage
forms ("3.0" and "300%" are equivalent).
CLI override
Section titled “CLI override”jackin load <agent> --memory-max 8GiB --cpus 2.0Operator override is a V1 nicety, not a config-file substitute. Useful for “this one launch is on a smaller machine.”
Backend translation
Section titled “Backend translation”Each backend implements a ResourceLimits translator:
- Docker (today):
--memory,--memory-reservation,--cpus,--ulimit nofile=N:N.oom_score_adjif needed for preserve-state. - Apple container (when selectable backends ships): direct per-allocation limits.
- systemd-run / bwrap (if it ever lands): cgroups properties.
A backend that can’t honor a declared limit (e.g. nofile on a container
runtime that doesn’t expose it) emits a warning at launch and proceeds —
not a hard error. Operators see the gap; the agent still runs.
Scope (V1)
Section titled “Scope (V1)”[runtime.limits]block onjackin.role.tomlwith the four fields above.--memory-max/--memory-high/--cpus/--nofileflags onjackin loadfor one-shot overrides.- Docker translator only in V1 — that’s the only backend.
size-crate-style parsing (binary and decimal); reuse a small parser rather than pulling the dependency.- Defaults: no limits applied if the field is absent (matches today).
[runtime.limits.oom]block:preserve_statedefaults to true,notifydefaults to true (no-op until Phase 2 lands).
- Per-workspace overrides. Manifest-level only in V1.
- Disk I/O limits (
--blkio-weight). Useful but harder to reason about; defer to user request. - Network bandwidth limits. Defer indefinitely.
- Auto-pausing OOM-killed agents instead of killing the container. Docker doesn’t expose a clean way; revisit per-backend later.
Open Questions
Section titled “Open Questions”- Should
cpusaccept percentages explicitly?"3.0"is unambiguous for Docker."300%"matches multicode but maps awkwardly to Docker (which doesn’t accept the percent sign). Recommended default: accept both at parse time, normalize to fractional core count internally. - Should manifest limits be inheritable across roles? If
org/basedeclaresmemory_max=16GiBandorg/derivedextends it, does the derived class inherit? Recommended default: yes, with override semantics — but role inheritance is a separate, larger design question and probably out of scope for V1. - OOM preserve_state interaction with worktree cleanup (the shipped per-branch safety policy described in Per-mount isolation). An OOM-killed instance should always preserve. The cleanup helper already handles non-zero exits; OOM is a special case of that. Confirm the existing helper sees OOM as non-zero.
Related Files
Section titled “Related Files”src/manifest/mod.rs—[runtime.limits]schemasrc/manifest/validate.rs— limit value validationsrc/runtime/launch.rs— Docker arg constructionsrc/cli/role.rs—--memory-maxetc. flags- New module (e.g.
src/runtime/limits.rs) — parser and Docker translator
See Also
Section titled “See Also”- Selectable sandbox backends — cross-backend translator lives here long-term
- Console resource panel — consumer of the runtime side (current usage vs configured limit)
- Autonomous task queue — queue parallelism limit interacts with per-agent memory limits