designingintelligencedesigning intelligence
Full-stack AI observability with tracing training data provenance, inspecting model weights to find where specific behaviors and knowledge are stored, and editing them directly without fine-tuning or retraining.
Backed by
attribution
Every response token traces back to the prompt tokens that caused it. Watch the signal flow through each layer until the answer locks in.
At L1 the model guesses "the". By L8 it's converging on "city". At L16, Paris is locked at 97% — the exact moment the answer forms.
diff
Connect any two checkpoints. See which weights shifted, what behaviour each shift caused, and which training dataset row is responsible.
| source | license | jurisdiction | opt-out | synthetic | status |
|---|---|---|---|---|---|
| CC BY-SA 4.0 | Global | no | no | clean | |
| unknown | US | partial | no | review | |
| OpenAI ToS | US | n/a | yes | flagged | |
| NLM ToS | US | no | no | clean | |
| derived | EU | unknown | no | flagged |
human readability
Model internals are not inherently unreadable. Every activation, weight, and layer state translated into language — with examples showing exactly when each feature fires.
| weight | raw | label |
|---|---|---|
| L14 · MLP W_out [2048,11] | 0.847 | capital city associations |
| L8 · attn head 3 · V | -0.312 | geographic suppression |
| L12 · MLP W_in [512,2048] | 0.601 | factual recall trigger |
| L6 · attn head 7 · Q | 0.229 | question parsing |
factual checks
Most models ship as black boxes. You have no way to know what they learned to suppress, amplify, or distort. Aquin surfaces it.
Trace which features consistently skew outputs along political, demographic, or cultural lines. See the weight, not just the symptom.
Find what the model refuses to say and why. Identify suppression circuits. See whether refusals are weight-level decisions or surface-level RLHF patches.
evals
Three suites built in. Run them on any checkpoint, edit, or quantization pass. Every run logged, every delta tracked.
Does the edit change only what you intended?
What actually changed between base and fine-tuned at the weight level.
How cleanly do features map to human-readable concepts?
| run | EditBench | FineTuneDiff | InterpScore | delta |
|---|---|---|---|---|
| llama-3.2-1b · base | 71 | 64 | 59 | baseline |
| llama-3.2-1b · sft-v1 | 78 | 79 | 63 | +9 avg |
| llama-3.2-1b · sft-v2 | 82 | 83 | 70 | +5 avg |
| llama-3.2-1b · int4-quant | 74 | 71 | 61 | −9 avg |
| llama-3.2-1b · rome-edit-1 | 94 | 88 | 73 | +14 avg |
agentic system
An autonomous interpretability copilot. Tell it what you want to understand. It runs the full pipeline, chains tools, and explains what the UI is showing — in real time.
aipedia
A living, community-indexed knowledge base of model features. Every behaviour, every circuit, every weight pattern — searchable and citable.
| feature | model | layer | circuit | confidence |
|---|---|---|---|---|
| capital city recall | Llama 3.2 1B | L14 | MLP W_out [2048,11] | 94% |
| hedging language | Llama 3.2 1B | L8 | attn head 3 · V | 87% |
| geographic association | Mistral 7B | L11 | MLP W_in [512,2048] | 81% |
| refusal circuit | Gemma 2B | L9 | attn head 7 · Q | 76% |
weight editing
Locate the exact MLP layer encoding a fact. Overwrite it with a rank-one update. No retraining. We're building the editor — this is the live experiment.
L12 carries 90.4% of causal recovery signal. red rings = above 40% threshold.
Not sure if Aquin is right for you?
Aquin
