# Run Telemetry (https://jackin.tailrocks.com/guides/run-telemetry/)


Every `jackin` command can record its full story — logs, lifecycle events, and per-stage timings — keyed by a **run id** (printed at startup in `--debug` mode). It goes to one of two sinks: a live external view streamed to any OpenTelemetry (OTLP) backend, or a local diagnostics file. Set one environment variable to stream to a backend; with none configured, jackin' falls back to the file. See [Where the run is recorded](#where-the-run-is-recorded) for how the two relate.

## Turn it on [#turn-it-on]

```sh
export OTEL_EXPORTER_OTLP_ENDPOINT=http://127.0.0.1:4317
jackin console --debug
```

That is the whole setup. With the variable unset, nothing changes — no network connections, no new behavior. jackin' exports over **OTLP/gRPC** and reads the standard OpenTelemetry variables: `OTEL_EXPORTER_OTLP_ENDPOINT` for a single base that every signal derives from, and the per-signal `OTEL_EXPORTER_OTLP_TRACES_ENDPOINT`, `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT`, and `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` when you want to point a signal somewhere else. OTLP-aware wrappers that set these are honoured as-is.

jackin' speaks gRPC only. If `OTEL_EXPORTER_OTLP_PROTOCOL` (or a per-signal variant) is set to anything other than `grpc` while an endpoint is configured, jackin' stops at startup with error `E016` instead of sending telemetry that would silently never arrive — set it to `grpc`, or leave it unset.

## Where the run is recorded [#where-the-run-is-recorded]

The backend and the file are alternatives, not duplicates:

* **No OTLP endpoint configured** → jackin' writes the run to a local diagnostics file at `~/.jackin/data/diagnostics/runs/<run-id>.jsonl`. The file is the fallback sink.
* **An OTLP endpoint configured** → the backend is the sink and no file is written. Set `JACKIN_DIAGNOSTICS_FILE=1` to additionally write the file and see the run on both sides at once.

`--debug` controls how much detail is recorded (see [Severity follows the debug flag](#severity-follows-the-debug-flag)) — not whether the file is written. If an export is configured but can't be delivered, jackin' surfaces a notice and falls back to writing the file so the run is never lost silently.

## What you can see in the backend [#what-you-can-see-in-the-backend]

Telemetry arrives under the service name `jackin`, and every span, log record,
and metric carries the run id of the invocation that produced it (resource
attribute `parallax.run.id`, which a backend such as Parallax promotes to a
queryable column). One run id is the thread that ties the whole invocation
together — even though, as below, it is split across several traces on purpose.

### Each screen is its own trace, linked to the next [#each-screen-is-its-own-trace-linked-to-the-next]

Rather than one giant trace, jackin' emits **one trace per screen** and links
them, so you can follow the operator's path and jump between the steps:

* the workspace list, settings, edit-workspace, and create-workspace screens
  each open their own trace, tagged `jackin.screen.name` and carrying a
  `navigate` event when the operator moves between them;
* the **launch** is its own trace, linked back to the list it started from,
  with every launch stage as a child span — so "how long did each thing take"
  and "where did the operator wait" read as a waterfall instead of JSON
  spelunking;
* the in-container **capsule session** is its own trace too, linked back to the
  launch (see [cross-container](#following-a-run-into-the-container)).

Because all of these share `parallax.run.id`, a backend that understands run ids
— Parallax is the reference target — still answers "show me everything for run
`<id>`" while letting you walk the links from screen to screen.

### Tags on the traces [#tags-on-the-traces]

Spans are tagged so you can filter and group by what the operator was doing:
the selected workspace and how it was chosen, the selected agent, the resolved
provider, and the screen each span belongs to. Selections, confirmations, and
launches are recorded as timestamped events on the screen they happened on.

### Logs and metrics [#logs-and-metrics]

* **All logs** — the same event stream as the diagnostics file: lifecycle
  breadcrumbs, container start/exit/crash events, and (in `--debug`) the
  full firehose including every external `docker`/`git` command. The OTLP
  transport's own logs are filtered out, so the export is jackin's story, not
  its plumbing.
* **Run state** — the end-of-run summary (stage durations, event counts,
  cache hits/misses) arrives as a structured log record.
* **Process metrics** — while the process runs, metrics export every 5
  seconds: `process.cpu.utilization`, `process.memory.usage`, and the tokio
  runtime counters `tokio.runtime.workers`, `tokio.runtime.alive.tasks`, and
  `tokio.runtime.global.queue.depth`. The same run-id resource attribute rides
  on every point, so CPU and memory line up with the run's traces and logs on
  one timeline.

## Following a run into the container [#following-a-run-into-the-container]

When a backend endpoint is configured, the launch hands the container the trace
context and a container-reachable endpoint, so the in-container session exports
too. Its telemetry carries a standard `session.id` (grouping the whole session
into one timeline) and the same `parallax.run.id`, and the session links back to
the launch trace — so you can follow a single run from the first screen, through
the launch, and into the container as a chain of connected traces. The session
is exported as it happens (per activity), so even a container that is killed
still leaves everything up to the moment of failure in the backend.

The container reaches a backend running on the host loopback (`127.0.0.1`/`localhost`) automatically: jackin' rewrites the endpoint to `host.docker.internal` and wires it to the host gateway. Point `OTEL_EXPORTER_OTLP_ENDPOINT` at an already-routable address and it is passed through unchanged.

## Severity follows the debug flag [#severity-follows-the-debug-flag]

Without `--debug`, the export is compact: lifecycle events, stage
boundaries, and errors. With `--debug`, the export includes the
DEBUG-severity firehose — the same two-tier rule the diagnostics file
follows, so an exported run is never noisier than the flag you launched
with.

## Viewing a run in Parallax [#viewing-a-run-in-parallax]

Start Parallax on the same machine (`parallax serve` — it listens for OTLP
on `4317`/`4318` and serves its UI on `http://127.0.0.1:4000`), launch any
jackin' command with the endpoint set, and note the run id jackin' prints.
Then:

```sh
parallax logs --run 8b4766
```

or open the Parallax UI and look up the run's traces. The same id also names
the local file at `~/.jackin/data/diagnostics/runs/<run-id>.jsonl` when one is
written (see [Where the run is recorded](#where-the-run-is-recorded)) — share
whichever surface fits the conversation.

## Scope and trust [#scope-and-trust]

The exporter sends to the endpoint you configure and nowhere else; the
expected setup is a backend on `127.0.0.1`. Everything the diagnostics file
captures can appear in the export — including external-command output that
may contain whatever the commands printed — so point it only at a backend
you run yourself.