SAE training

Collect activations from labeled probe sets (LLM resid_post or embedding hidden states) and train temporary sparse autoencoders on checkpoint weights or saved activation shards. Works on catalog models and supported families (GPT-2, Pythia, Llama, Qwen, BERT-style encoders) after aquin load model. Catalog models may use aquin load sae for a public dictionary; family models typically need aquin sae train first. Use aquin activations capture for probe-scale exports; reuse them with sae train --activations or train directly from corpus (writes _acts_layer<N>/ for later reuse). Tracked locally; auto-uploads to your CLI inbox when logged in.

Prerequisiteaquin login · aquin load model llama-3.2-1b

3 commands

aquin activations capture

agent tool: run_capture_activations

Run a batch of probes through the loaded model, capture activations per layer, and write a manifest + tensor shards. Optional --checkpoint patches fine-tuned weights (LLM .pt or embedding .pt / HF directory).

Flag	Description
--dir*	Output directory (manifest.json + metadata.json + layers/layer_<N>.pt).
--prompts	Optional JSON/JSONL probes. Omit to auto-generate with --count and --topic.
--count	Number of probes when auto-generating (default: 6, max: 64).
--topic	Theme for generated probes (default: general knowledge…).
--layers	Comma-separated layer indices or all (default: all).
--checkpoint	Fine-tuned checkpoint (.pt state dict or HF directory; omit for base model).
--position	Pool tokens: last (default) or mean.
--granularity	prompt (default) or token. Token mode writes token_spans.jsonl.
--balance	Balance probes evenly across metadata groups.
--group	Metadata field used with --balance (optional).
--encode-sae	Also write SAE feature vectors for --sae-layer (requires pulled public SAE).
--sae-layer	Layer for --encode-sae (default: model SAE layer).
--name	Capture label in manifest (default: checkpoint stem or base).
--check	Save capture-activations-check.json and capture-activations-check.png in cwd.

example

LLM: TransformerLens resid_post. Embedding: hidden_states per layer. Inbox card: activationCapture (probe count, layers, metadata chips). Optional sae/sae_layer_N.pt with --encode-sae after aquin load sae. Token granularity writes concatenated token tensors plus token_spans.jsonl.

aquin sae train

agent tool: run_sae_train

Train a temporary SAE on activations streamed from a corpus, checkpoint weights, or a saved activation directory. LLMs use TransformerLens resid_post; embedding models use mean-pooled hidden states. Use --quick for smoke tests. Output: ~/.aquin/sae/user/<model>/… or ~/.aquin/sae/user/embed-<model>/…

Flag	Description
--model*	Loaded model slug or family HuggingFace id (must match aquin load model).
--layer*	Hook layer index.
--checkpoint	Fine-tuned checkpoint .pt (omit for base-model SAE).
--activations	Reuse saved activations: _acts_layer<N> from a prior train, chunk_*.pt dir, or activations capture output (manifest + layers/). Skips forward passes.
--balance	When training from a capture dir, import a balanced subset of probes first.
--group	Metadata field used with --balance (optional).
--quick	Shorter training run (~100k tokens).
--corpus	JSON/JSONL text corpus (default: streamed OpenWebText). Ignored with --activations.
--name	Tag for output directory under ~/.aquin/sae/user/.
--save	Explicit output .pt path.

example

Typical flow: activations capture on labeled probes (optional) → sae train (with or without --activations) → aquin load sae --user <name> → aquin trace. Corpus collection writes _acts_layer<N>/chunk_*.pt + manifest.json for reuse. For checkpoint diff on fine-tunes, see Checkpoint SAE (/docs/checkpoint-sae).

aquin sae align

agent tool: run_sae_align

Hungarian match of decoder directions between two SAE checkpoints (typically public vs temp-trained). Prints mean cosine and weakest/strongest pairs. Tracks saeAlign card and uploads to your CLI inbox when logged in. Optional alignment map JSON with --save.

Flag	Description
--sae-a*	First SAE .pt (e.g. public ~/.aquin/sae/<model>/sae_layer8.pt).
--sae-b*	Second SAE .pt (e.g. temp ~/.aquin/sae/user/.../sae_layer8.pt).
--save	Write full pairs map JSON.
--max-features	Cap features aligned (default: all).

example

Probe format

JSONL rows can carry metadata preserved in metadata.json and manifest.json for honest/deceptive, language, or cohort labels:

capture_probes.jsonl

Typical workflow

capture → train → align

Post-training diff and decoder alignment: Checkpoint SAE. External training metrics only: Training watch.