OngoingMaking 3B model beat 20B in coding. read

Aquin DevKit

SDK & CLI Documentation

Aquin Labs · 2026

Overview

The Aquin SDK records training runs locally, including metrics, configuration, and a final checkpoint, then the CLI packages and pushes them to Aquin for post-hoc inspection. Once pushed, the run appears in your dashboard under CLI runs with the full inspection suite: loss curves, learning rate, grad norm, epoch summaries, SAE diff, and model diff.

The SDK is framework-agnostic. It works with any Python training loop that produces a PyTorch model. For HuggingFace Trainer and TRL, a TrainerCallback pattern wires everything in without touching training logic.

typical workflow

Install

pip

Requires Python 3.9 or later. The only dependency is requests. PyTorch is an optional dependency, only needed if you call run.checkpoint().

After installing, save your API key:

first-time setup

Your key is stored in ~/.aquin/config.json. You can also set the AQUIN_API_KEY environment variable instead.

Core API

aquin.init()

Creates a new run and returns a Run object. Call this once before your training loop. Creates aquin_run/ in your current working directory.

ParameterTypeDescription
base_modelstr

HuggingFace model ID, e.g. meta-llama/Llama-3.2-1B-Instruct. Used to load the base model for SAE diff and model diff analysis on the VM.

run_namestr

Display name for the run in your dashboard. Defaults to run-{uuid[:8]} if omitted.

configdict

Training hyperparameters. Written to config.json immediately. Can also be passed to finish() instead. See the config section below for recognised keys.

aquin.init() — full config example

run.log()

Records metrics for one training step. Call this every step inside your training loop. Thread-safe, does not write to disk on each call, so it has no measurable impact on training throughput.

ParameterTypeDescription
stepreqint

Global training step index. Can be 0-based or 1-based, just be consistent.

lossreqfloat

Scalar training loss for this step. Drives the loss chart and summary bar.

learning_ratefloat

Current LR from your scheduler. Enables the learning rate chart.

grad_normfloat

Gradient norm. Compute with torch.nn.utils.clip_grad_norm_(). Enables the grad norm chart.

momentum_normfloat

Optimizer momentum norm. Enables the momentum series in the training chart.

epochint

Current epoch number. Enables the epoch summary table showing per-epoch start/end loss, delta, grad, and trend sparkline.

batchint

Batch index within the current epoch. Stored for display.

total_batchesint

Total batches per epoch. Stored for display.

step_msfloat

Wall-clock time for this step in milliseconds. Appears in every log line in the terminal panel.

run.log() — full example

run.checkpoint()

Saves the model checkpoint to aquin_run/checkpoints/checkpoint.pt. One checkpoint per run, each call overwrites the previous save. Also flushes all metrics to disk. Call this once at the end of training.

The checkpoint is saved as {"step": N, "state_dict": {"..."} }. If you pass a PEFT model, state_dict() returns only the adapter weights, the frozen base is not included, keeping the checkpoint small.

ParameterTypeDescription
modelreqnn.Module

Any PyTorch nn.Module. For PEFT/LoRA models, pass the PEFT model directly, its state_dict() already filters to adapter-only weights.

stepreqint

The training step this checkpoint corresponds to. Stored in the checkpoint file and used to label checkpoint analysis in the UI.

run.checkpoint()

run.finish()

Finalises the run. Flushes all metrics to aquin_run/metrics.json. Optionally writes config if not already passed to aquin.init(). Prints a summary and the push instructions.

ParameterTypeDescription
configdict

Training hyperparameters. Only needed here if you did not pass config to aquin.init(). Passing it to init() is preferred so it is written before training starts.

run.finish()

Quickstart, bare training loop

If you are writing your own training loop without HuggingFace Trainer, instrument it directly.

bare training loop

HuggingFace Trainer / TRL

Use a TrainerCallback to hook into the Trainer event loop. The callback receives on_step_begin, on_log, and on_train_end events from the Trainer and bridges them into Aquin's API.

Key points: on_log fires every logging_steps steps, so set logging_steps=1 to record every step. grad_norm is available in logs when gradient clipping is enabled. on_train_end receives the model through **kwargs, not as a named parameter.

AquinCallback — TrainerCallback

Pass the callback to your Trainer or SFTTrainer:

wiring to SFTTrainer

Full QLoRA example

Complete production training script using QLoRA with BitsAndBytes, PEFT, and TRL SFTTrainer.

full qlora training script with aquin

CLI reference

The CLI is installed automatically with the package as the aquin command.

aquin loginPrompts for your API key and saves it to ~/.aquin/config.json.
aquin packageBundles aquin_run/ into aquin_run.tar.gz in the current directory. Run after finish().
aquin pushAuthenticates, gets a signed upload URL, and streams aquin_run.tar.gz to the Aquin VM. Run appears in dashboard on completion.
aquin whoamiPrints the email address of the currently logged-in account.
aquin helpLists available commands.

You can override the API key at push time with --key:

push with explicit key

You can override the API base URL with the AQUIN_BASE_URL environment variable. Useful for self-hosted deployments.

What gets inspected

After a successful push, the Aquin VM extracts the tar, reads your run artifacts, and streams the inspection results to your dashboard. Here is exactly what each piece of data drives.

Metrics

Sourced from aquin_run/metrics.json, written by run.log() and flushed on checkpoint() and finish().

SDK paramStored inUI element
lossmetrics.json

Loss chart, summary bar final loss, epoch summary table

learning_ratemetrics.json

Learning rate chart

grad_normmetrics.json

Grad norm chart, log line

momentum_normmetrics.json

Momentum series in training chart

step_msmetrics.json

Timing in every log line: [step N] loss=... Xms

epochmetrics.json

Epoch summary table, per-epoch start/end loss, delta, grad, sparkline

batch / total_batchesmetrics.json

Step display info

Checkpoint analysis

Sourced from aquin_run/checkpoints/checkpoint.pt, saved by run.checkpoint(). The VM loads your base model and fine-tuned checkpoint side by side and runs two analyses.

SDK paramStored inUI element
SAE diffcheckpoint.pt

Feature delta bars, which SAE features strengthened or suppressed, top 50 by absolute delta

Model diffcheckpoint.pt

Consistency, suppression, robustness scores, category deltas, base vs ft output comparison, attack surface diff

Both analyses require the base_model you passed to aquin.init() to be a supported model. Currently supported: meta-llama/Llama-3.2-1B-Instruct, EleutherAI/pythia-2.8b, EleutherAI/pythia-70m-deduped, gpt2.

Config panel

Sourced from aquin_run/config.json. The following keys are recognised and displayed in the algo config panel in the dashboard.

SDK paramStored inUI element
methodconfig.json

Panel title, e.g. QLORA

lrconfig.json

Learning rate row

epochsconfig.json

Epochs row

rankconfig.json

LoRA rank row

lora_alphaconfig.json

LoRA alpha row

per_device_train_batch_sizeconfig.json

Batch size row

gradient_accumulation_stepsconfig.json

Grad accum row

max_seq_lenconfig.json

Max seq len row

optimizerconfig.json

Optimizer row

schedulerconfig.json

Scheduler row

dropoutconfig.json

Dropout row

weight_decayconfig.json

Weight decay row

grad_clipconfig.json

Grad clip row

datasetconfig.json

Dataset row

Aquin Labsaquin@aquin.app

Work with us

Interpretability tooling, custom SAE databases, mechanistic audits, circuit reports, and hands-on research, experiments, and studies for teams of all sizes. Reach us at aquin@aquin.app

Book a call

Not sure if Aquin is right for you?

SubstackMedium
© 2026 Aquin. All rights reserved.

Aquin