SAE training
Collect activations from labeled probe sets (LLM resid_post or embedding hidden states) and train temporary sparse autoencoders on checkpoint weights. Works in both LLM and embedding mode after aquin load. Use capture-activations to persist vectors with metadata; syncs an activationCapture card to the web orchestrator.
2 commands
aquin capture-activations
agent tool: run_capture_activations
Run a batch of probes through the loaded session model (from aquin load), capture activations per layer, and write a manifest + tensor shards. Optional --checkpoint patches LLM weights for capture.
| Flag | Description |
|---|---|
| --output* | Output directory (manifest.json + layers/layer_<N>.pt). |
| --prompts | Optional JSON/JSONL probes. Omit to auto-generate with --count and --topic. |
| --count | Number of probes when auto-generating (default: 6, max: 64). |
| --topic | Theme for generated probes (default: general knowledge…). |
| --layers | Comma-separated layer indices or all (default: all). |
| --checkpoint | Fine-tuned .pt checkpoint (omit for base model). |
| --position | Pool tokens: last (default) or mean. |
| --encode-sae | Also write SAE feature vectors for --sae-layer (requires pulled public SAE). |
| --sae-layer | Layer for --encode-sae (default: model SAE layer). |
| --name | Capture label in manifest (default: checkpoint stem or base). |
| --output-json | Print result JSON to stdout. |
LLM: TransformerLens resid_post. Embedding: hidden_states per layer. Web card: activationCapture (probe count, layers, metadata chips). Optional sae/sae_layer_N.pt with --encode-sae after aquin pull sae.
aquin sae train
agent tool: run_sae_train
Train a temporary SAE on activations streamed from a corpus or checkpoint weights. Uses internal collect_activations (chunked .pt) then trains decoder. Use --quick for smoke tests. Output: ~/.aquin/sae/user/<model>/<name>/sae_layer<N>.pt.
| Flag | Description |
|---|---|
| --model* | Catalog model slug. |
| --layer* | Hook layer index. |
| --checkpoint | Fine-tuned checkpoint .pt (omit for base-model SAE). |
| --quick | Shorter training run (~100k tokens). |
| --corpus | JSON/JSONL text corpus (default: streamed OpenWebText). |
| --name | Tag for output directory under ~/.aquin/sae/user/. |
| --output | Explicit output .pt path. |
Typical flow: capture-activations on labeled probes → sae train on full corpus → sae align vs public SAE. See Checkpoint SAE (/docs/checkpoint-sae) for diff and align.
Probe format
JSONL rows can carry metadata preserved in manifest.json for honest/deceptive, language, or cohort labels:
Typical workflow
Post-training diff and decoder alignment: Checkpoint SAE. External training metrics only: Training watch.
