Aquin LogoAquinLabs
Login

Training watch

Passive observer for external training runs — not aquin simulate and not checkpoint SAE tools. Your trainer (PyTorch loop, custom script, Aquin SDK, etc.) writes metrics as JSONL; aquin watch ingests those lines, stores them locally, and mirrors loss/LR/grad charts to the web orchestrator panel. Aquin does not run optimizer steps or store merged weights here — for post-training SAE diff on checkpoints, see Checkpoint SAE (/docs/checkpoint-sae). Registry lives at ~/.aquin/watch/<run_id>/ (manifest.json + events.jsonl). Simulation runs stay in ~/.aquin/runs/ and appear only under aquin list simulations.

Prerequisiteaquin connect --name <session> (web mirror uses that named session)

4 commands

aquin watch list

List local watch runs: run id, status, name, base model, event count.

example

aquin watch init

Register a watch run before ingesting metrics. Writes manifest.json and a start event to events.jsonl under ~/.aquin/watch/<run_id>/.

FlagDescription
--nameDisplay name (default: watch-run).
--modelBase model slug shown on web chart (e.g. llama-3.2-1b).
--quantQuantization label: fp16, int8, q4, none (default: none).
--modeRun mode label (default: external).
example

aquin watch ingest

Parse a metrics JSONL file and append observations to a watch run. Batch mode (default) reads the file once and exits. With --follow, keeps tailing the file as new lines are appended (live training). Each synced ingest pushes training.watch.start and opens a new chart card on the web mirror. Web sync uses the active named session from aquin connect --name.

FlagDescription
--runExisting watch run id (from init or list).
--nameWith --file and no --run: create a new run with this name then ingest.
--filePath to metrics JSONL. Omit to read stdin.
--followTail the file; ingest new lines as they appear.
--finishMark run stopped when ingest ends.
--map src=dstRename a metric column (e.g. --map train_loss=loss).
--step-fieldStep column name (default: step, global_step, …).
--offsetSkip first N lines (resume ingest).
--auto-stepAssign steps 0,1,2… when rows have no step field.
example

Fixture: fixtures/e2e/watch/metrics.jsonl. One JSON object per line. Scalar keys (loss, learning_rate, grad_norm, epoch) become chart channels. Special rows: {"type":"signal",…} and {"status":"stopped"}.

aquin watch <run_id>

Replay or live-tail the local events.jsonl for a run. Default follows new events (Ctrl+C to detach). --no-follow replays once and exits. Syncs to the active named session from aquin connect --name.

FlagDescription
--no-followReplay stored events once; do not wait for new lines.
--output jsonPrint raw JSON events.
example

Quick start

No GPU or model load required. Run aquin connect once so ingest mirrors charts to your session tab in the web app.

batch ingest + web mirror

Live training (--follow)

Point ingest at a metrics file your trainer appends to. Each new JSON line is picked up automatically and synced to the web mirror when connected.

two terminals

Metrics JSONL format

One JSON object per line. Step comes from global_step, step, or --step-field. Numeric scalars become chart channels. Use --map train_loss=loss to rename trainer column names.

metrics.jsonl

Watch vs simulate

Training watchaquin simulate
PurposeObserve real external training metricsForecast training without weight updates
Storage~/.aquin/watch/<run_id>/~/.aquin/runs/<run_id>/
List commandaquin watch listaquin list simulations
GPUNot requiredRequired (model load)
Web mirrorLoss/LR/grad chart cards (training.watch.*)Simulation result cards (tool.result)

For SAE diff / temp train / align on real checkpoints after training, see Checkpoint SAE. Watch does not store merged weights or run GPU SAE tools.

Session sync uses the active named session from aquin connect --name <session>. Resume later with aquin session switch <name>.