Aquin LogoAquinLabs
Login

Evals: Embedding

Behavioral probes for embedding models. The primary eval is custom Q&A with cosine-similarity scoring instead of keyword overlap. Embedding-specific retrieval evals live under Inspection (embed-retrieval, embed-sae-faithfulness). Requires embedding mode.

Prerequisiteaquin load --model gte-small

1 command

aquin eval

agent tool: run_custom_eval

Custom eval for embedding models: encodes each prompt and reference answer, scores by cosine similarity instead of keyword overlap. Use for semantic match tasks (paraphrase detection, retrieval-style Q&A).

FlagDescription
--name*Eval name.
--prompts*JSON array of query strings.
--reference_answers*JSON array of target strings.
--thresholdCosine similarity pass threshold (default: 0.5).
example

Same command as LLM eval; scoring backend switches automatically based on loaded model type.