Aquin LogoAquinLabs
Login

Inspection (SAE): LLMs

Sparse autoencoder tools for language models. Decomposes residual-stream activations into interpretable features, runs full attribution pipelines, steers features at inference time, and benchmarks feature quality. Requires LLM mode plus a pulled SAE matching your model and layer.

Prerequisiteaquin load --model gpt2-small && aquin pull sae gpt2-small-l8

7 commands

aquin inspect

agent tool: run_full_inspection

Full attribution pipeline in one pass: generates the model response, runs causal mediation analysis per token/layer, decomposes top SAE features, renders logit lens per layer, and builds the circuit graph. This is the primary LLM inspection entry point. Syncs the complete inspection card to the web.

FlagDescription
--prompt*Input prompt (model completes from here).
--modelOverride active model.
--layerSAE layer (default: from pulled SAE config).
example

CLI-only shortcut. In aquin chat, ask: Run full inspection on "…"

aquin feature-logits

agent tool: get_feature_logits

Projects one SAE feature's decoder direction through the unembedding matrix and returns the top tokens promoted and suppressed. Answers: if I amplify this feature, which tokens become more likely?

FlagDescription
--feature*SAE feature index.
--promptOptional prompt context for activation weighting.
example

aquin feature-neighbors

agent tool: get_feature_neighbors

Returns the nearest neighbor SAE features by decoder cosine similarity. Use to find redundant, related, or polysemous features near a target index.

FlagDescription
--feature*SAE feature index.
--top_kNumber of neighbors (default: 5).
example

aquin steer

agent tool: run_steer_and_show

Adds a scaled multiple of one SAE feature's decoder direction to the residual stream at the SAE layer during the forward pass. Compares baseline vs steered output side-by-side. Feature label auto-resolves from causal labeling if omitted.

FlagDescription
--prompt*Input prompt.
--feature_idx*SAE feature to steer.
--feature_labelHuman-readable label (auto-resolved if omitted).
--strengthSteering multiplier (default: 1.0).
--max_new_tokensGeneration length.
example

aquin multi-steer

agent tool: run_multi_steer

Steers multiple SAE features simultaneously in a single forward pass. Pass a JSON array of {feature_idx, strength, label} objects.

FlagDescription
--prompt*Input prompt.
--features*JSON array, e.g. '[{"feature_idx":42,"strength":1.5}]'
--max_new_tokensGeneration length.
example

aquin benchmark

agent tool: run_benchmarks_on_top_feature

Runs InterpScore, Feature Purity, and MUI (Monosemanticity Under Intervention) on a single SAE feature. Quantifies how interpretable and causally coherent the feature is.

FlagDescription
--feature_idx*SAE feature index.
example

Distinct from behavioral evals (audit, red-team). This scores one feature's interpretability.

aquin umap

agent tool: ensure_umap_loaded

Loads the UMAP projection of all SAE decoder directions and opens the UMAP Explorer panel on the web. Requires a precomputed UMAP file for the pulled SAE.

example