Aquin DevKit
SDK & CLI Documentation
Aquin Labs · 2026
Overview
The Aquin SDK records training runs locally, including metrics, configuration, and a final checkpoint, then the CLI packages and pushes them to Aquin for post-hoc inspection. Once pushed, the run appears in your dashboard under CLI runs with the full inspection suite: loss curves, learning rate, grad norm, epoch summaries, SAE diff, and model diff.
The SDK is framework-agnostic. It works with any Python training loop that produces a PyTorch model. For HuggingFace Trainer and TRL, a TrainerCallback pattern wires everything in without touching training logic.
typical workflow
Install
pip
Requires Python 3.9 or later. The only dependency is requests. PyTorch is an optional dependency, only needed if you call run.checkpoint().
After installing, save your API key:
first-time setup
Your key is stored in ~/.aquin/config.json. You can also set the AQUIN_API_KEY environment variable instead.
Core API
aquin.init()
Creates a new run and returns a Run object. Call this once before your training loop. Creates aquin_run/ in your current working directory.
HuggingFace model ID, e.g. meta-llama/Llama-3.2-1B-Instruct. Used to load the base model for SAE diff and model diff analysis on the VM.
Display name for the run in your dashboard. Defaults to run-{uuid[:8]} if omitted.
Training hyperparameters. Written to config.json immediately. Can also be passed to finish() instead. See the config section below for recognised keys.
aquin.init() — full config example
run.log()
Records metrics for one training step. Call this every step inside your training loop. Thread-safe, does not write to disk on each call, so it has no measurable impact on training throughput.
Global training step index. Can be 0-based or 1-based, just be consistent.
Scalar training loss for this step. Drives the loss chart and summary bar.
Current LR from your scheduler. Enables the learning rate chart.
Gradient norm. Compute with torch.nn.utils.clip_grad_norm_(). Enables the grad norm chart.
Optimizer momentum norm. Enables the momentum series in the training chart.
Current epoch number. Enables the epoch summary table showing per-epoch start/end loss, delta, grad, and trend sparkline.
Batch index within the current epoch. Stored for display.
Total batches per epoch. Stored for display.
Wall-clock time for this step in milliseconds. Appears in every log line in the terminal panel.
run.log() — full example
run.checkpoint()
Saves the model checkpoint to aquin_run/checkpoints/checkpoint.pt. One checkpoint per run, each call overwrites the previous save. Also flushes all metrics to disk. Call this once at the end of training.
The checkpoint is saved as {"step": N, "state_dict": {"..."} }. If you pass a PEFT model, state_dict() returns only the adapter weights, the frozen base is not included, keeping the checkpoint small.
Any PyTorch nn.Module. For PEFT/LoRA models, pass the PEFT model directly, its state_dict() already filters to adapter-only weights.
The training step this checkpoint corresponds to. Stored in the checkpoint file and used to label checkpoint analysis in the UI.
run.checkpoint()
run.finish()
Finalises the run. Flushes all metrics to aquin_run/metrics.json. Optionally writes config if not already passed to aquin.init(). Prints a summary and the push instructions.
Training hyperparameters. Only needed here if you did not pass config to aquin.init(). Passing it to init() is preferred so it is written before training starts.
run.finish()
Quickstart, bare training loop
If you are writing your own training loop without HuggingFace Trainer, instrument it directly.
bare training loop
HuggingFace Trainer / TRL
Use a TrainerCallback to hook into the Trainer event loop. The callback receives on_step_begin, on_log, and on_train_end events from the Trainer and bridges them into Aquin's API.
Key points: on_log fires every logging_steps steps, so set logging_steps=1 to record every step. grad_norm is available in logs when gradient clipping is enabled. on_train_end receives the model through **kwargs, not as a named parameter.
AquinCallback — TrainerCallback
Pass the callback to your Trainer or SFTTrainer:
wiring to SFTTrainer
Full QLoRA example
Complete production training script using QLoRA with BitsAndBytes, PEFT, and TRL SFTTrainer.
full qlora training script with aquin
CLI reference
The CLI is installed automatically with the package as the aquin command.
You can override the API key at push time with --key:
push with explicit key
You can override the API base URL with the AQUIN_BASE_URL environment variable. Useful for self-hosted deployments.
What gets inspected
After a successful push, the Aquin VM extracts the tar, reads your run artifacts, and streams the inspection results to your dashboard. Here is exactly what each piece of data drives.
Metrics
Sourced from aquin_run/metrics.json, written by run.log() and flushed on checkpoint() and finish().
Loss chart, summary bar final loss, epoch summary table
Learning rate chart
Grad norm chart, log line
Momentum series in training chart
Timing in every log line: [step N] loss=... Xms
Epoch summary table, per-epoch start/end loss, delta, grad, sparkline
Step display info
Checkpoint analysis
Sourced from aquin_run/checkpoints/checkpoint.pt, saved by run.checkpoint(). The VM loads your base model and fine-tuned checkpoint side by side and runs two analyses.
Feature delta bars, which SAE features strengthened or suppressed, top 50 by absolute delta
Consistency, suppression, robustness scores, category deltas, base vs ft output comparison, attack surface diff
Both analyses require the base_model you passed to aquin.init() to be a supported model. Currently supported: meta-llama/Llama-3.2-1B-Instruct, EleutherAI/pythia-2.8b, EleutherAI/pythia-70m-deduped, gpt2.
Config panel
Sourced from aquin_run/config.json. The following keys are recognised and displayed in the algo config panel in the dashboard.
Panel title, e.g. QLORA
Learning rate row
Epochs row
LoRA rank row
LoRA alpha row
Batch size row
Grad accum row
Max seq len row
Optimizer row
Scheduler row
Dropout row
Weight decay row
Grad clip row
Dataset row
