
The Sandbox
Aquin Labs · April 2026
Training from the dashboard, without writing a line of code
The standard fine-tuning workflow requires at minimum a training script, a dataset file, a hyperparameter configuration, and a machine with a GPU. Those are reasonable requirements when the training objective is well-understood and the data is already clean. They are significant barriers when the goal is exploratory: testing whether a small dataset can push a model toward a specific behavior, or seeing what happens to calibration when the rank is doubled, or asking whether the agent can generate and validate training data for a niche topic before you invest time writing it yourself.
The sandbox training system removes those barriers. It runs LoRA fine-tuning on Llama 3.2 1B Instruct directly from the Aquin dashboard: you edit the dataset in the browser, configure every hyperparameter through a panel with inline explanations, and start the run with a single button. The training executes on Aquin's GPU infrastructure. The full live monitoring pipeline kicks in automatically, so every step event, every signal, and the full post-training model diff and SAE feature diff arrive in the dashboard exactly as they would from an SDK-connected run.
The agent can participate in every part of that workflow. It reads the current dataset and configuration, generates new training examples directly, tunes hyperparameters based on what it knows about the task, and explains the reasoning behind its suggestions. The loop from question to configured run can happen entirely in the chat panel.
sandbox pipeline · from dataset to diff
the full monitoring stack runs automatically on every sandbox run. no additional setup.
Building and editing the training dataset
The sandbox uses instruction-response pairs as its training format, which maps directly to the template the training worker applies to each row before tokenization. Each pair becomes a structured string with an instruction block and a response block, passed to the tokenizer at the configured maximum sequence length. The format is fixed; the content is entirely yours.
The dataset panel gives you six ways to populate and modify rows. You can edit directly in the inline table, upload a JSONL file, paste multiple JSONL lines at once and have the parser fill the rows automatically, or ask the agent to generate examples for a given topic. The inline editor lets you change any cell in any row at any time, and deleted rows take effect immediately without a separate save step.
Version tracking is built into the confirmation flow. When you confirm a dataset, it receives a version number. That number is stamped against the run when training starts, so the monitoring panel can always tell you exactly which version of the dataset produced which run. If you edit the dataset after a completed run, the badge changes to flag the mismatch between the confirmed version and the version used in the last run, so you know a new run is needed before the dataset change takes effect in the model.
dataset ingestion paths
version tracking · confirm stamps a version, badge reflects mismatch
version stamps on confirm. badge shows whether the current dataset matches what the last run used.
Full hyperparameter control
Every configurable parameter in the LoRA training loop is exposed in the algo panel. The panel groups parameters into three sections: LoRA core settings that control the adapter architecture, optimization settings that control how the adapter is trained, and batching settings that control how data is fed to the optimizer. Each parameter has a valid range enforced at the input, a current default, and a tooltip that explains what it does and when to change it.
The target modules selector is worth calling out specifically. It controls which attention projections receive LoRA adapters. The default is Q and V, which is the standard configuration for most instruction-tuning tasks. Adding K and O increases the adapter's ability to modify attention patterns more completely. Adding the MLP projections (gate, up, down) extends adaptation into the feed-forward layers, which is useful when the task requires changes to factual associations rather than just generation style. The panel renders each projection as a toggle, so the selection is visual and does not require knowing the projection key names.
Gradient accumulation deserves its own note. The training worker processes each dataset row as its own forward pass, so the real batch size is always one. Gradient accumulation simulates a larger batch by accumulating gradients across N steps before updating the optimizer. Setting accum steps to 4 on a 40-row dataset produces the same optimizer updates as a batch size of 4 would. The panel displays the effective batch size derived from the accum steps value, so the relationship is visible without calculation.
presets · fast / balanced / quality
Low rank, no regularization, one epoch. Right for rapid iteration where you want to confirm data format and pipeline before committing to a longer run.
Default settings. Good quality-to-speed tradeoff for most instruction-tuning tasks on datasets of 20 to 200 examples.
Higher rank, warmup, more target modules, longer sequences, gradient accumulation for a larger effective batch. Slower but more thorough adaptation. Use when data quality is high and overfitting is not a concern.
presets apply a coherent configuration across all parameters at once. individual fields remain editable after applying a preset.
full parameter reference · key · range · default · description
The agent as a training co-pilot
The training agent has read and write access to both the dataset and the hyperparameter configuration before the run starts. It can read the current rows, generate new ones and append them immediately, restructure the dataset, and patch any subset of hyperparameters in a single call. Those changes take effect in the UI the moment the tool call resolves, so you can watch the dataset panel update and the algo config panel reflect new values as the agent works.
The practical workflow this enables is conversational configuration. You can describe the task, the domain, and roughly how many examples you want, and the agent will generate a coherent dataset and suggest a configuration tuned for that task size and domain complexity. For a 30-row factual reinforcement task, it will likely suggest a low rank, low learning rate, and no dropout. For a 100-row instruction-following task with diverse phrasing, it might suggest adding K and O to the target modules, increasing gradient accumulation, and setting a warmup step count to stabilise early training.
The agent does not start training runs autonomously. It prepares the configuration and explains its reasoning, and the run starts when you press the button. That boundary is intentional: the preparation is something the agent can do with high confidence based on the task description, but the decision to commit GPU time is yours. After a run completes, the agent can read the resulting metrics from the training snapshot, evaluate whether the configuration produced healthy training dynamics, and suggest what to change for the next run. The loop from task description to configured run to post-run analysis to next-run configuration is entirely conversational.
agent tool surface · sandbox mode
algo_set_config accepts a partial patch. only the provided fields change. unspecified fields keep their current values.
The Sandbox versus SDK mode
The training tab offers two modes: sandbox, which runs from the dashboard and uses Aquin's infrastructure, and SDK, which connects to your own training process via a Python SDK and an API key. The two modes are distinct but feed into the same monitoring system. A sandbox run and an SDK-connected run produce the same step events, the same model diff, the same SAE feature diff, and the same calibration panel.
The practical division is by use case. The Sandbox is for exploration: small datasets, quick iteration, tasks where you want to see a result without setting up infrastructure. SDK is for production: large datasets, custom training loops with framework-specific optimizations, distributed training, or runs that need to execute on hardware you control. The two modes coexist in the same tab; switching is a single selection at the start of the session.
The agent works in both modes, though with different tool availability. In sandbox mode it has access to the full dataset and configuration tool surface. In SDK mode it can still read training metrics and interact with the monitoring panels, but dataset and config tools are unavailable since those are controlled by your training script. The mode determines which tools are active; the agent adapts accordingly.
What happens after you start a run
When a sandbox run starts, the frontend opens a streaming connection to the training API. The training worker runs in a thread on Aquin's GPU infrastructure and emits step events as the run progresses. Those events flow through the same ingestion pipeline that SDK runs use: each step event triggers the signal engine, updates the loss and gradient charts, and feeds the signals panel. There is no polling and no page refresh. The dashboard updates on each event, typically within a few hundred milliseconds of the training step completing.
The monitoring panel locks dataset and config editing while training is running. The version badge is frozen at the version used to start the run. The hyperparameter panel greys out all inputs. The intent is to preserve a clear record of what produced the run: the dataset version and configuration that were active when you pressed the button are exactly what the training worker received, and nothing can be changed mid-run to create ambiguity about that.
When the run completes, the model diff arrives as a streaming event and renders in the monitoring panel. The SAE feature diff follows, then the calibration panel. Those three together give you the behavioral comparison, the internal representation changes, and the confidence calibration shift that your dataset and configuration produced. The agent can read all three from the training snapshot and narrate what they mean for the next iteration, closing the loop from run to diagnosis to adjusted configuration.
Not sure if Aquin is right for you?
Aquin
