NewVivly x Aquin — Structuring Social Data for AI. Read the case study
The Training Inspect System
training monitorsignal detectionmodel diffsae feature diffgradient analysisregression trackingcalibration

The Training Inspect System

Aquin Labs · April 2026

Watching a model learn in real time

loss divergence · gradient norm · SAE feature diff · model diff · ECE

Fine-tuning a model is not a black box you submit a job to and collect a checkpoint from. It is a process with a structure: a loss trajectory that tells a story about optimization, gradient dynamics that reveal how information is flowing through the network, feature activations that shift as the model rewrites its internal representations to accommodate the training objective. Most of that structure goes unobserved. The loss curve is visible. Everything under it is not.

Aquin's training inspect system brings the structure to the surface. It receives a stream of step events from your training run — loss, learning rate, gradient norms, per-layer breakdown, dead layer list, epoch index — and runs a signal engine against each step in real time. When the engine detects a gradient spike, a loss plateau, a collapsed attention head, or the onset of loss divergence, a signal fires immediately with the specific metric that triggered it and the exact step at which it occurred.

The dashboard renders all of it live: a loss sparkline that updates on each event, a signals feed ordered by severity, a model diff panel when training completes, and a per-layer SAE feature diff that shows exactly which internal representations the fine-tune rewrote.

loss and gradient norm · 20-step window

LOSSplateau0.74MAX GRAD NORMgrad spike1.84s1s21s46s71s96

signal markers overlay each curve at the step they fired. plateau on loss at s76, grad spike on grad norm at s51.

Aquin · Early Access

Run this on your own training run

Get Early Access

What gets streamed

Step event schema · framework-agnostic

The training inspect system is agnostic to training framework. It consumes a step event schema: step index, loss, learning rate, max gradient norm, per-layer grad norms as a record, dead layer list, epoch index. The schema is intentionally flat: no nested objects, no optional deep structures.

The per-layer grad norm breakdown is what enables the dead layer and attention collapse detectors. Without it, the engine can only observe aggregate gradient behavior. With it, it can name the specific layer that has collapsed and track how long it has been dead. The layer naming convention follows whatever key names your training framework uses; the engine applies a regex for attention layers and treats everything else as non-attention.

step event schema

StepSnapshot

stepnumberStep index
lossnumberTraining loss at this step
learning_ratenumber?Current LR from scheduler
maxGradnumber?Max gradient norm across all params
gradNormsRecord<string, number>?Per-layer grad norms, enables dead layer detection
deadLayersstring[]?Layers already over streak threshold
epochnumber?Current epoch, used in plateau message

gradNorms is the key field. without per-layer breakdown, dead layer and attention head detection are unavailable.

Lossper step
Learning ratescheduler
Grad normsmax + per layer
Weight normsper layer
Dead layersnamed list
#
Epochindex

The signal engine

Five detectors · priority order · 30-step cooldown

Five detectors

The signal engine is a pure function that runs on each new step snapshot. It takes the full history of steps plus two persistent streak maps — one for non-attention layers, one for attention layers — and returns a signal if one fired, or null. The streak maps are the only stateful part: they persist across steps so that dead layer detection can track how many consecutive steps a layer has had near-zero gradient.

signal priority order · detector 01 checked first

01
loss divergingcritical

Ten consecutive steps with monotonically increasing loss. The engine computes the raw rise across that window and flags critical when the delta exceeds 0.5. This is the earliest reliable sign that a run is heading toward divergence before the loss curve makes it visually obvious.

loss[i] ≥ loss[i−1] for 10 stepsrise > 0.5 → critical
02
gradient spikewarn / critical

Max grad norm versus the rolling mean of the last 20 steps. Fires when the latest norm exceeds five times the baseline and is above 1.0 in absolute terms. Consecutive spikes indicate the optimizer is stepping into a region it cannot navigate cleanly.

maxGrad > 5× rolling mean AND > 1.0> 20× rolling mean → critical
03
attention head deadwarn

Attention layers with gradient norms below 1e-6 for five consecutive steps. Attention collapse is mechanistically distinct from MLP layer death: a collapsed head may still produce outputs but has lost the ability to differentiate across positions.

gradNorm[attn] < 1e-6 for 5 stepsfires on fifth consecutive step
04
dead layerswarn

Non-attention layers with gradient norms below 1e-6 for five consecutive steps are flagged dead. The signal names the specific layers. These are candidates for pruning or weight reinitialization.

gradNorm[layer] < 1e-6 for 5 stepsfires on fifth consecutive step
05
loss plateauinfo

Rolling variance over the last twenty steps divided by the squared rolling mean. When variance falls below 0.1% of mean-squared, the signal fires with the current epoch so you can judge whether this is healthy convergence or premature stalling.

var(loss[−20:]) < mean² × 0.001always info, optionally triggers early stop

cooldown of 30 steps per signal type. streak maps persist across steps for dead layer tracking.

Priority and cooldown

loss divergence and gradient spikes are checked first because they indicate active instability that may warrant stopping the run. dead layer and attention collapse are checked next, naming specific failed components. loss plateau is checked last because it often describes expected, healthy behavior rather than a problem.

A cooldown of thirty steps prevents the same signal type from re-emitting continuously. A plateau that persists does not flood the feed. A gradient spike that resolves and re-occurs will fire again after thirty steps — the second occurrence is a distinct event that warrants attention.

The model diff

Behavioral before/after · consistency score · suppression score · robustness score

Behavioral delta

When training completes, the dashboard receives three scores that describe how the fine-tune changed the model's behavior: consistency score, suppression score, and robustness score. These are the same metrics from Aquin's eval system, applied to the before/after comparison. The base model is the reference. The fine-tuned checkpoint is the subject. The difference between the two scores is the behavioral delta your training objective produced.

The robustness score is particularly informative for factual fine-tuning. A fine-tune intended to add or reinforce factual knowledge should produce higher robustness on those facts — the model should be more confident under surface corruptions of the relevant prompts. A robustness drop on the target facts after factual fine-tuning is a sign that the model has learned a surface pattern rather than a grounded representation.

model diff · base vs fine-tuned

consistency+0.141 − (mean KL / anchor entropy)
base
0.73
ft
0.87
suppression−0.090.6 × length_penalty + 0.4 × hedge_penalty
base
0.7
ft
0.61
robustness+0.071 − (mean_drop / clean_confidence)
base
0.67
ft
0.74

green = improved, red = regressed relative to base. same metrics as the eval system.

The SAE feature diff

Which internal representations changed and by how much

Layer change density

Behavioral scores tell you how the model changed from the outside. The SAE feature diff tells you which internal representations changed and by how much. For each layer in the sparse autoencoder, the diff computes how many features shifted activation between the base and fine-tuned model, the mean absolute activation delta across features, and the single feature with the highest delta.

The layer-level change density is the most informative aggregate signal. A fine-tune that changes 14 out of 512 features at L8 and 2 out of 512 at L4 is doing something focused and deep — rewriting a specific representational layer, not spreading surface changes across the network. The top feature per layer is where mechanistic investigation should start. If L10's top shifted feature is F501 (refusal / safety language) and the training data had nothing to do with refusals, that finding warrants investigation in the model inspector.

SAE feature diff · changed features per layer · blue cells = shifted

L4
2/ 512 changed
mean Δ 0.004F412 · punctuation / sentence boundary
L6
8/ 512 changed
mean Δ 0.012F089 · hedging / uncertainty markers
L8
14/ 512 changed
mean Δ 0.031F213 · geographic reference tracking
L10
5/ 512 changed
mean Δ 0.014F501 · refusal / safety language
L12
6/ 512 changed
mean Δ 0.019F047 · capital city associations
L14
3/ 512 changed
mean Δ 0.009F091 · factual recall trigger

each row is one layer. each cell is one SAE feature. blue = activation shifted post fine-tune. L8 carries the heaviest rewrite.

The regression tracker

Category scores across runs · drift detection

A single model diff tells you how a fine-tune changed behavior relative to the base. That is useful once. The regression tracker extends it across runs: every time a model diff arrives, the category scores are appended to a per-category history so you can see how behavior has moved across all completed runs in the session.

Each category row shows a sparkline of its score across runs. A category that regressed by more than five percentage points on the latest run is flagged with a red indicator. The regression detection is relative to the immediately prior run, not to the base — a score can look healthy against the base model while still trending negatively across iterations. The tracker catches that drift where the raw diff cannot.

regression tracker · category score across 4 runs

factual72%
reasoning70%
refusal71%
code66%

each point is one completed run. red = category score regressed vs prior run.

Confidence calibration

Whether the model knows what it knows · base ECE 0.148 → ft 0.071

A model's stated confidence and its actual accuracy can diverge in ways that are invisible from loss alone. A fine-tune can lower loss while simultaneously making the model systematically overconfident. ECE measures that gap directly: it bins outputs by stated confidence, computes actual accuracy within each bin, and reports the mean gap between the two.

The calibration panel runs this comparison between the base and fine-tuned model using the training dataset as the evaluation set. The reliability diagram shows both models' accuracy-per-confidence-bin as bar pairs against a perfect-calibration diagonal. The per-topic ECE table breaks the aggregate score down by category — models trained on domain-specific data frequently improve ECE on the target domain while degrading it on adjacent topics that share surface patterns with the training examples.

The low-confidence row list surfaces the specific dataset inputs where the fine-tuned model assigns probability below the configured threshold. These inputs can be exported directly as a labeled dataset for the next training iteration — the model's own uncertainty becomes the selection criterion for the data that trains the next version.

calibration · reliability diagram + per-topic ECE

0.000.250.500.751.000.00.20.40.60.8
science
0.1200.060
history
0.1900.090
math
0.0800.040
coding
0.2100.110
medicine
0.3100.170
law
0.2700.190

left bar = base ECE per bucket, right bar = fine-tuned. green = under-confident, red = over-confident vs perfect diagonal.

Training as the start of the investigation

Mechanistic interpretability has traditionally studied models after they exist. The training inspect system changes where the investigation starts. A signal that fires at step 61 about a dead layer at L6 is most usefully followed up by opening the fine-tuned checkpoint in the Model Inspector once training completes, going directly to L6, and running the causal trace to see whether that layer still contributes to the model's outputs.

The same logic applies to the model diff results. A suppression score that rises from base to fine-tuned opens a data investigation: the training inspector can open the dataset in the Data Inspector and run the toxicity and PII modules against the columns most likely to produce suppressive signal.

The SAE feature diff provides the entry point for the mechanistic side of that investigation. Once you know which features shifted most and at which layers, you can navigate the Model Inspector directly to those features, check their benchmark scores, run them through the logit lens, and steer them to confirm their role. The diff turns the post-training model inspection from an open-ended search into a targeted inquiry.

Training findingFollow-up
Dead layer signal

Open checkpoint in Model Inspector → go to that layer → run causal trace to confirm whether it still contributes to outputs.

Suppression increase

Open training dataset in Data Inspector → run toxicity and PII modules → identify columns driving hedging signal.

Unexpected SAE shift

Navigate Model Inspector to the shifted feature → check benchmark scores → steer to confirm causal role before weight editing.

Calibration regression

Export low-confidence rows as labeled dataset → use as selection criterion for next training iteration.

The calibration panel adds a third path out of the training run. Low-confidence rows can be exported directly as a labeled dataset for the next iteration. The regression tracker closes the loop in the other direction, confirming the next iteration did not trade one weakness for another. Together they make the training session not the end of a workflow but the input to the next one.

Aquin · Early Access

Run this on your own training run

Get Early Access
Aquin Labsaquin@aquin.app

Join the Aquin Research Community

LLM researchers & ML engineers — open research, fellowships, hackathons, and early beta access.

Join Discord

Not sure if Aquin is right for you?

© 2026 Aquin. All rights reserved.

Aquin