Training watch

Passive observer for external training runs. Not aquin simulate and not checkpoint diff tools. Your trainer (PyTorch loop, custom script, Aquin SDK, etc.) writes metrics as JSONL; aquin watch streams those lines into ~/.aquin/watch/<run_id>/ (manifest.json + events.jsonl). A new run is registered automatically when you pass --file. Aquin does not run optimizer steps or store merged weights here. For post-training checkpoint analysis, see Checkpoint SAE (/docs/checkpoint-sae). Simulation runs stay in ~/.aquin/runs/ and appear under aquin list simulation. Watch does not require a loaded model.

Prerequisiteaquin login (optional; watch runs locally without GPU)

3 commands

aquin list watch

List local watch runs: run id, status, name, base model, event count.

example

aquin watch --file <path>

Register a watch run (when --run is omitted) and stream metrics from a JSONL file. Batch mode reads the file once and exits. With --follow, keeps tailing the file as new lines are appended during live training.

Flag	Description
--file*	Path to metrics JSONL.
--follow	Tail the file; ingest new lines as they appear.
--run	Existing watch run id (skip auto-register).
--name	Run label when creating a new watch.
--model	Base model slug shown on charts (e.g. llama-3.2-1b).
--finish	Mark run stopped when streaming ends.
--map src=dst	Rename a metric column (e.g. --map train_loss=loss).
--step-field	Step column name (default: step, global_step, …).
--offset	Skip first N lines (resume).
--auto-step	Assign steps 0,1,2… when rows have no step field.

example

Fixture: fixtures/e2e/watch/metrics.jsonl. One JSON object per line. Scalar keys (loss, learning_rate, grad_norm, epoch) become chart channels. Special rows: {"type":"signal",…} and {"status":"stopped"}.

aquin replay watch <run_id>

Replay or live-tail the local events.jsonl for a run. Default follows new events (Ctrl+C to detach). --no-follow replays once and exits.

Flag	Description
--no-follow	Replay stored events once; do not wait for new lines.

example

Quick start

Watch does not run GPU inspection on your metrics file and does not require a loaded model. Metrics are stored locally under ~/.aquin/watch/.

quick start

Live training (`--follow`)

Point watch at a metrics file your trainer appends to. Each new JSON line is picked up automatically.

two terminals

Metrics JSONL format

One JSON object per line. Step comes from global_step, step, or --step-field. Numeric scalars become chart channels. Use --map train_loss=loss to rename trainer column names.

metrics.jsonl

Watch vs simulate

	Training watch	aquin simulate
Purpose	Observe real external training metrics	Forecast training without weight updates
Storage	~/.aquin/watch/<run_id>/	~/.aquin/runs/<run_id>/
List command	aquin list watch	aquin list simulation
GPU	Not required	Required (model load)
Web	Local JSONL only	Tracked + CLI inbox when logged in

For SAE diff / temp train / align on real checkpoints after training, see Checkpoint SAE. Watch does not store merged weights or run GPU SAE tools.

Training watch

aquin list watch

aquin watch --file <path>

aquin replay watch <run_id>

Quick start

Live training (--follow)

Metrics JSONL format

Watch vs simulate

Live training (`--follow`)