Inspection: LLMs (non-SAE)

Tools that inspect LLM behavior through activations, attention, and weight health without decomposing the residual stream into SAE features. Requires LLM mode: load an LLM with aquin load model before running any command below.

Prerequisiteaquin login · aquin load model gpt2-small (or pythia-70m, llama-3.2-1b, etc.)

4 commands

aquin check attention

agent tool: run_attention_routing

Runs a forward pass on the prompt and extracts per-head attention weight matrices for every layer. Each head receives a routing score summarizing how concentrated its attention is. --prompt is required.

Flag	Description
--prompt*	Input text to analyze.
--topk	Number of top heads to highlight (default: 5).
--check	Save attention-check.json and attention-check.png in the current directory.

example

Tracks an attention routing card and uploads to your CLI inbox when logged in.

aquin check layer

agent tool: run_layer_analysis

Measures activation stability across paraphrased prompts and out-of-distribution (OOD) similarity at each layer. Stability scores how much hidden states drift when the same meaning is phrased differently. OOD similarity compares in-domain vs OOD prompt activations to flag layers that collapse on unfamiliar input.

Flag	Description
--prompts	JSON array of paraphrases for stability PCA (≥2; if you pass 1, default probes are added automatically).
--in_domain_prompts	In-distribution prompts for OOD comparison.
--ood_prompts	Out-of-distribution prompts.
--topk	Top layers to report.
--check	Save layer-analysis-check.json and layer-analysis-check.png in the current directory.

example

aquin check perturbation

agent tool: run_perturbation_sensitivity

Zeroes out (or adds Gaussian noise to) hidden channels one at a time and measures KL divergence between the perturbed output distribution and the clean baseline. Channels with high KL impact are sensitivity hotspots, useful for finding brittle representations.

Flag	Description
--prompt*	Input prompt.
--n_channels	Number of channels to perturb (default: 32).
--method	dropout or gaussian (default: dropout).
--check	Save perturbation-check.json and perturbation-check.png in the current directory.

example

aquin check weight

agent tool: check_weights

Scans all weight tensors for trojan/backdoor signatures (kurtosis spikes, outlier density, singular-value ratio) and runs SVD rank analysis across Q/K/V/O/MLP matrices. Flags collapsed or suspicious layers before you trust a checkpoint.

Flag	Description
--collapse_threshold	SVD rank ratio below which a layer is flagged collapsed (default: 0.01).
--check	Save check-weights-check.json and check-weights-check.png in the current directory.

example

Renders trojan scan and rank health cards side-by-side on the web.