Idle Runtime Cleanup Hooks

Status: Open — design proposal (Phase 4, Agent Orchestrator Research Program)

Problem

A long-running agent container accumulates state outside the agent’s own working set: a Java agent leaves Gradle daemons holding onto a few GB of heap; a Node agent leaves npm caches with thousands of file descriptors; the runtime itself caches build artifacts indefinitely. After a few hours of an idle agent sitting open, that’s real resource cost — both on disk and in long-lived processes the operator never asked for.

multicode addresses this with declarative idle hooks: after N seconds of inactivity, run a cleanup command (gradle --stop, etc.); optionally recycle the container.

Why It Matters

The autonomous queue (Phase 4) keeps containers warm waiting for work. Without cleanup, those warm containers accumulate state every cycle.
It’s a small feature with disproportionate impact for long-running fleets — exactly the workflow the program targets.
It generalizes naturally to anything an role wants done while idle (commit checkpoint snapshots, push WIP branches, evict caches).

Inspiration in multicode

Sources:

README — Autonomous queueing (the idle-runtime-cleanup, idle-runtime-cleanup-delay-seconds, idle-runtime-cleanup-interval-seconds, and idle-runtime-restart knobs are documented here)
Config — config.toml [autonomous] block

[autonomous]
idle-runtime-cleanup = true
idle-runtime-cleanup-delay-seconds = 300       # idle for 5 min before first cleanup
idle-runtime-cleanup-interval-seconds = 900    # re-run every 15 min while still idle
idle-runtime-restart = false                   # also recycle the container?

multicode runs gradle --stop and (optionally) terminates remaining Gradle daemon/worker processes inside Apple-container workspaces. Then, if idle-runtime-restart = true, recycles the runtime entirely (stops the container, starts a new one).

The implementation watches the workspace status (their equivalent of agent runtime status) and fires when status has been Idle for the configured delay.

Recommended Shape

Generalize the concept: roles declare what to run when idle, the operator decides whether to enable it, jackin’s runtime supervisor fires the hook based on observed status.

Role config

version = "v1alpha2"

[runtime.idle]
commands = [
  "gradle --stop",
  "find /home/agent/.cache -atime +1 -delete"
]
delay_seconds = 300
interval_seconds = 900
restart_after_cleanup = false

commands is a list — each runs in sequence inside the agent container via docker exec. They’re the role’s recommendation; the operator chooses whether to enable.

Operator opt-in

# operator config
[roles."the-architect"]
enable_idle_hooks = true   # default false

Defaults to off because:

The hooks run inside someone else’s container based on the agent class’s declarations.
Some operators want the warm state preserved.
Misconfigured commands could break the running agent.

Trigger

The supervisor watches the agent runtime status bus. When an instance has been Idle for delay_seconds continuously:

Run commands[0], commands[1], … in sequence via docker exec. Each command gets a 60-second timeout (configurable).
If restart_after_cleanup = true, eject and re-load the instance afterward (preserves the data dir, just recycles the container).
Record the cleanup in the persistent storage layer’s tool_history table (or a dedicated cleanup_history).
Re-arm: next cleanup fires interval_seconds later if still idle.

A status transition out of Idle cancels the pending cleanup.

Console visibility

The console resource panel (when open) shows “Last cleanup: 4m ago” in the per-agent row. CLI: jackin status <selector> includes the last-cleanup time.

Scope (V1)

[runtime.idle] block on jackin.role.toml.
Operator opt-in via enable_idle_hooks per role.
Idle detection from the status bus.
Sequential command execution via docker exec with per-command timeout.
Optional restart_after_cleanup toggle.
Console rendering of last-cleanup time.
Cleanup history written to the persistent storage layer.

Defer

Idle hooks based on resource thresholds (“cleanup when memory > X”) in addition to time-based. Defer.
Hooks for other states (busy, question). Idle-only in V1.
Per-workspace override of idle config. Agent-class-level only in V1.
Operator-defined ad-hoc cleanup commands (not in role). Defer; use jackin exec instead if needed.
Notification on cleanup failure beyond a console toast. Defer.

Open Questions

Default delay/interval values. multicode uses 300s/900s. Are those sensible jackin defaults, or should roles be more conservative? Recommended: match multicode for V1; tune from feedback.
Cleanup during foreground operator attach. If the operator is actively jackin hardline’d into an idle session, should cleanup fire? multicode runs it regardless. Recommended: suppress when attached — the operator’s about-to-type-something signal isn’t visible to the status adapter.
Recovery from cleanup-broken-container. If restart_after_cleanup is true and the new container fails to come up, the operator’s session is unrecoverable from cleanup alone. Recommended: one retry, then mark instance failed and surface for operator intervention.

New module (e.g. src/runtime/idle.rs) — supervisor
src/manifest/mod.rs — [runtime.idle] schema
src/runtime/launch.rs — wires supervisor into instance lifecycle
The persistent-storage module — cleanup history table
src/console/manager/state.rs — last-cleanup display