Aquin is the research company using interpretability to

designingintelligence

Models today are largely black boxes. We can prompt them, fine-tune them, and add guardrails, but our understanding of what they actually learned and why they behave the way they do remains limited.

tokposembL0L1L2L3L4L5L6L723%L849%L965%L1052%L1140%L1221%L13L14L15outParis
high impactmediumlowminimal

attribution

Trace every response word back to the prompt tokens that caused it, and see how signal flows through layers to get there.

01prompt highlighting
prompt
WhatisthecapitalofFrance?
response
ThecapitalofFranceisParis.
causal weight
low to high
causal mediation analysis
02network digraph

Causal graph of how signal flows through layers. Thicker edges carry more weight. See which layers matter for any output.

inputL4L8L12L14L16output
edge weight = causal signal

logit lens

See what the model thinks at every layer as it builds toward a final answer. Watch a vague token sharpen into a confident prediction.

prompt: "what is the capital of France?"
layer 1
the12%
layer 4
capital34%
layer 8
city58%
layer 14
Paris81%
layer 16
Paris97%
token prediction per layer

weight editing

Edit a weight directly. See the delta and the response change side by side. No retraining, no guessing — the diff is immediate and traceable.

04deltas applied
layer 14+0.42
L14 · MLP W_out [2048,11]factual recall strengthened
layer 8-0.18
L8 · attn head 3 · Vhedging language reduced
layer 12+0.07
L12 · MLP W_in [512,2048]confidence tone increased
rank-one edits, no retraining
05response diff
What is the capital of Australia?
before
I believe it might be Sydney, though I'm not entirely certain.
after
Canberra is the capital of Australia.
Summarise the French Revolution.
before
A period of significant change, possibly around the late 1700s, involving various political shifts.
after
The French Revolution (1789-1799) overthrew the monarchy and established a republic.
same prompt, different weights

diff

Connect any two checkpoints (base vs fine-tuned, fp16 vs int4) and see exactly which weights shifted, which features got compressed, and what behaviour changed.

base
delta
fine-tuned
changed weights highlighted

benchmarks

Three suites built into Aquin. Run them on any checkpoint, edit, or quantization pass. Know immediately whether a change made the model better or worse.

EditBench

edit fidelity

Surgical precision. Does the edit change only what you intended?

edit success94%
side-effect score97%
generalisation81%

FineTuneDiff

checkpoint diff

What actually changed between base and fine-tuned at the weight level.

weight shift coverage88%
behaviour correlation91%
drift detection76%

InterpScore

interpretability

How cleanly do features map to human-readable concepts?

monosemanticity73%
concept linearity68%
label confidence85%
run history
runEditBenchFineTuneDiffInterpScoredelta
llama-3.2-1b · base716459baseline
llama-3.2-1b · sft-v1787963+9 avg
llama-3.2-1b · sft-v2828370+5 avg
llama-3.2-1b · int4-quant747161-9 avg
llama-3.2-1b · rome-edit-1948873+14 avg

model support

Aquin works across the open-source ecosystem. Each model is validated against a case study before it ships.

Llama 3.2 1B
Meta
live
causal tracing + ROME edits
Mistral 7B
Mistral AI
live
SAE feature extraction + logit lens
Gemma 2B
Google
beta
weight diffing post-SFT
Phi-3 Mini
Microsoft
beta
interpretability scoring
Qwen 1.5 4B
Alibaba
planned
bias + censor audit

agentic systems

Aquin closes the loop. Two autonomous agents: one that finds the weight, one that edits it. Both run without you touching a hyperparameter.

08agentic edit

Observe a behaviour, trace it, plan a minimal edit, apply it, verify. Fully autonomous.

$ aquin agent edit --model llama-3.2-1b --behaviour "confident factual recall"
09agentic locate

Describe a behaviour. The agent traces it to the exact circuit responsible and tells you whether it's a weight-level decision or a surface patch.

$ aquin agent locate --model llama-3.2-1b --behaviour "refuses political questions"

human readability

Model internals are not inherently unreadable. Aquin translates activations, weights, and layer states into language an engineer can reason about.

10neuron translator
L12 · N0470.847 · MLP W_out
fires for capital cities94%
L8 · N2130.612 · attn head 3
tracks geographic references87%
L14 · N0910.391 · MLP W_in
suppresses hedging language79%
L6 · N5020.229 · attn head 7
detects question intent71%
activation to language
10internals vs labels
weightrawlabel
L14 · MLP W_out [2048,11]0.847capital city associations
L8 · attn head 3 · V-0.312geographic suppression
L12 · MLP W_in [512,2048]0.601factual recall trigger
L6 · attn head 7 · Q0.229question parsing
raw weights mapped to behaviour

factual checks

Most models ship as black boxes. You have no way to know what they learned to suppress, amplify, or distort. Aquin surfaces it.

12bias detection

Trace which features consistently skew outputs along political, demographic, or cultural lines. See the weight, not just the symptom.

leftpolitical leanright
negativesentiment skewpositive
group Ademographicgroup B
traced to layer activations
13censor audit

Find what the model refuses to say and why. Identify suppression circuits. See whether refusals are weight-level decisions or surface-level RLHF patches.

medical dosagesuppressed
political figuressoftened
competitor namessuppressed
historical eventsunfiltered
weight-level origin mapped

isolated models

Every user gets their own model instance per tab. State never bleeds between sessions. Work on Llama in one tab, Mistral in another, no interference.

sessionuser_a01
model
Llama 3.2 1B
active task
causal tracing
state isolated from all other sessions
per tab · per user id · no bleed

bulk editing

Apply a set of edits across multiple layers in a single operation. Queue them, run them, verify the aggregate delta.

suppress hedging language
L7-L10 · 14 weights
done
strengthen factual recall
L12-L15 · 9 weights
done
reduce demographic bias
L4-L8 · 22 weights
running
remove competitor suppression
L9 · 3 weights
queued
batch rank-one edits across layers

aipedia

A living, community-indexed knowledge base of model features. Every behaviour, every circuit, every weight pattern. Searchable. Citable. Growing.

search
featuremodellayercircuitconfidence
capital city recallLlama 3.2 1BL14MLP W_out [2048,11]94%
hedging languageLlama 3.2 1BL8attn head 3 · V87%
geographic associationMistral 7BL11MLP W_in [512,2048]81%
refusal circuitGemma 2BL9attn head 7 · Q76%
capital citieshedgingrefusal circuitsgeographicRLHF artifacts

Not sure if Aquin is right for you?

All Systems StatusPoliciesResearch© 2026 Aquin. All rights reserved.

Aquin