Inspection (SAE): LLMs

Sparse autoencoder tools for language models. Decomposes residual-stream activations into interpretable features, runs full attribution pipelines, steers features at inference time, and benchmarks feature quality. Use sae-stats for multi-layer batch exports and degradation heatmaps. After fine-tuning, use Checkpoint SAE (/docs/checkpoint-sae) for diff, temp train, and align on real checkpoints.

Prerequisiteaquin login · aquin load model gpt2-small · aquin load sae gpt2-small-l8

7 commands

aquin trace

agent tool: run_full_inspection

Full attribution pipeline in one pass: generates the model response, runs causal mediation analysis per token/layer, decomposes top SAE features, renders logit lens per layer, and builds the circuit graph. This is the primary LLM inspection entry point. Tracks the complete inspection card and uploads to your CLI inbox when logged in.

Flag	Description
--prompt*	Input prompt (model completes from here).
--layer*	SAE layer with a downloaded checkpoint (aquin load sae <model>-l<n>). If omitted, prints layers on disk and pull commands.
--check	Save trace-check.json and trace-check.png in the current directory.
--umap	Load SAE UMAP projection after the run and open the explorer panel.
--model	Override active model.

example

CLI-only shortcut. In aquin chat, ask: Run full inspection on "…"

aquin feature logit

agent tool: get_feature_logits

Projects one SAE feature's decoder direction through the unembedding matrix and returns the top tokens promoted and suppressed. Answers: if I amplify this feature, which tokens become more likely?

Flag	Description
--feature*	SAE feature index.
--topk	Number of tokens to show (default: 10).
--prompt	Optional prompt context for activation weighting.
--check	Save feature-logits-check.json and feature-logits-check.png in the current directory.
--umap	Load SAE UMAP projection after the run and open the explorer panel.

example

aquin feature neighbor

agent tool: get_feature_neighbors

Returns the nearest neighbor SAE features by decoder cosine similarity. Use to find redundant, related, or polysemous features near a target index.

Flag	Description
--feature*	SAE feature index.
--topk	Number of neighbors (default: 8).
--check	Save feature-neighbors-check.json and feature-neighbors-check.png in the current directory.
--umap	Load SAE UMAP projection after the run and open the explorer panel.

example

aquin steer

agent tool: run_steer_and_show

Adds a scaled multiple of one SAE feature's decoder direction (or a saved LAT file) to the residual stream at the SAE layer during the forward pass. With --save, exports that decoder direction as a reusable LAT JSON file instead of running generation.

Flag	Description
--prompt	Input prompt. Required unless exporting with --save.
--feature_idx	SAE feature to steer (omit when using --vector).
--save	Export feature_idx as a LAT JSON file and exit.
--vector	Path to LAT JSON from steer --save.
--feature_label	Human-readable label (auto-resolved if omitted).
--probe_id	Experiment key from feature locate --persist (default: deception_feature).
--strength	Steering multiplier (default: 20.0).
--layer	Hook layer (default: model or vector metadata).
--max_new_tokens	Generation length.
--umap	Load SAE UMAP projection after steering (web explorer).

example

LAT workflow: feature locate --persist → steer --save → steer --vector. Export tracks a steerVector card and uploads to your CLI inbox when logged in.

aquin multi-steer

agent tool: run_multi_steer

Steers multiple SAE features simultaneously in a single forward pass. Pass a JSON array of {feature_idx, strength, label} objects.

Flag	Description
--prompt*	Input prompt.
--features*	JSON array, e.g. '[{"feature_idx":42,"strength":1.5}]'
--max_new_tokens	Generation length.
--umap	Load SAE UMAP projection after steering (web explorer).

example

aquin benchmark

agent tool: run_benchmarks_on_top_feature

Runs InterpScore, Feature Purity, and MUI (Monosemanticity Under Intervention) on a single SAE feature. Quantifies how interpretable and causally coherent the feature is.

Flag	Description
--feature_idx*	SAE feature index.
--check	Save benchmark-check.json and benchmark-check.png in the current directory.
--umap	Load SAE UMAP projection after scoring (web explorer).

example

Distinct from behavioral evals (audit, red-team). This scores one feature's interpretability.

aquin sae-stats

agent tool: run_sae_stats

Batch export of multi-layer SAE statistics over a probe dataset. Computes per-layer mean L0 sparsity, top-k firing features, and a probe×layer heatmap for degradation profiles and stressor comparisons. Works for any layer with a pulled SAE checkpoint.

Flag	Description
--prompts*	JSON/JSONL probe file. Each row: text (or prompt) plus optional id, stressor, lang, quant_run_id.
--layers	Layer spec: all (default) or comma-separated, e.g. 9 or 0,9,15.
--topk	Top features per layer (default: 10).
--save	Write full schema_version=1 JSON export to this path.
--check	Save sae-stats-check.json and sae-stats-check.png in the current directory.
--umap	Load SAE UMAP projection after the export (web explorer).

example

Probe schema: [{"id":"h1","text":"...","stressor":"baseline","lang":"en"}]. Export schema_version=1 JSON: layer_stats (per-layer top_features), layer_profile (mean_l0/sparsity), heatmap (probe×layer). Tracks SaeStatsCard and uploads to your CLI inbox when logged in. Prefer --umap on feature commands instead of a standalone umap verb.