Training watch
Passive observer for external training runs — not aquin simulate and not checkpoint SAE tools. Your trainer (PyTorch loop, custom script, Aquin SDK, etc.) writes metrics as JSONL; aquin watch ingests those lines, stores them locally, and mirrors loss/LR/grad charts to the web orchestrator panel. Aquin does not run optimizer steps or store merged weights here — for post-training SAE diff on checkpoints, see Checkpoint SAE (/docs/checkpoint-sae). Registry lives at ~/.aquin/watch/<run_id>/ (manifest.json + events.jsonl). Simulation runs stay in ~/.aquin/runs/ and appear only under aquin list simulations.
4 commands
aquin watch list
List local watch runs: run id, status, name, base model, event count.
aquin watch init
Register a watch run before ingesting metrics. Writes manifest.json and a start event to events.jsonl under ~/.aquin/watch/<run_id>/.
| Flag | Description |
|---|---|
| --name | Display name (default: watch-run). |
| --model | Base model slug shown on web chart (e.g. llama-3.2-1b). |
| --quant | Quantization label: fp16, int8, q4, none (default: none). |
| --mode | Run mode label (default: external). |
aquin watch ingest
Parse a metrics JSONL file and append observations to a watch run. Batch mode (default) reads the file once and exits. With --follow, keeps tailing the file as new lines are appended (live training). Each synced ingest pushes training.watch.start and opens a new chart card on the web mirror. Web sync uses the active named session from aquin connect --name.
| Flag | Description |
|---|---|
| --run | Existing watch run id (from init or list). |
| --name | With --file and no --run: create a new run with this name then ingest. |
| --file | Path to metrics JSONL. Omit to read stdin. |
| --follow | Tail the file; ingest new lines as they appear. |
| --finish | Mark run stopped when ingest ends. |
| --map src=dst | Rename a metric column (e.g. --map train_loss=loss). |
| --step-field | Step column name (default: step, global_step, …). |
| --offset | Skip first N lines (resume ingest). |
| --auto-step | Assign steps 0,1,2… when rows have no step field. |
Fixture: fixtures/e2e/watch/metrics.jsonl. One JSON object per line. Scalar keys (loss, learning_rate, grad_norm, epoch) become chart channels. Special rows: {"type":"signal",…} and {"status":"stopped"}.
aquin watch <run_id>
Replay or live-tail the local events.jsonl for a run. Default follows new events (Ctrl+C to detach). --no-follow replays once and exits. Syncs to the active named session from aquin connect --name.
| Flag | Description |
|---|---|
| --no-follow | Replay stored events once; do not wait for new lines. |
| --output json | Print raw JSON events. |
Quick start
No GPU or model load required. Run aquin connect once so ingest mirrors charts to your session tab in the web app.
Live training (--follow)
Point ingest at a metrics file your trainer appends to. Each new JSON line is picked up automatically and synced to the web mirror when connected.
Metrics JSONL format
One JSON object per line. Step comes from global_step, step, or --step-field. Numeric scalars become chart channels. Use --map train_loss=loss to rename trainer column names.
Watch vs simulate
| Training watch | aquin simulate | |
|---|---|---|
| Purpose | Observe real external training metrics | Forecast training without weight updates |
| Storage | ~/.aquin/watch/<run_id>/ | ~/.aquin/runs/<run_id>/ |
| List command | aquin watch list | aquin list simulations |
| GPU | Not required | Required (model load) |
| Web mirror | Loss/LR/grad chart cards (training.watch.*) | Simulation result cards (tool.result) |
For SAE diff / temp train / align on real checkpoints after training, see Checkpoint SAE. Watch does not store merged weights or run GPU SAE tools.
Session sync uses the active named session from aquin connect --name <session>. Resume later with aquin session switch <name>.
