Aquin SDK

Train a model on Aquin, generate an API key, and call it from anywhere — Python, JavaScript, curl. One line of code. No ML infrastructure required.

You trained a model. Now what? Most platforms stop there — you get a model file and a bill. Aquin goes further. Every trained model gets a live inference endpoint and an API key system, so you can actually use what you built.

install in seconds

The SDK is available on both PyPI and npm. Install whichever fits your stack:

bash
pip install aquin
bash
npm install aquin

generate your api key

Once you have a completed training run, go to the API Keys tab in your Aquin dashboard. Select the model you want to expose, hit Generate, and copy the key immediately — it's shown exactly once and never stored in plaintext.

Keys look like this: aq-m_xxxxxxxxxxxxxxxx

You can generate multiple keys per model, revoke any of them at any time, and track usage per key — calls made and credits spent. Each key has its own independent TPM and RPM limits you can configure from the dashboard.

call your model — python

Three lines. Initialize the client, call complete, get a response:

python
from aquin import AquinClient

client = AquinClient("aq-m_your_key_here")
res = client.complete("What is LoRA fine-tuning?")

print(res.text)
print(f"tokens in: {res.tokens_in}, out: {res.tokens_out}")

Use system_prompt to give your model a persona or set of instructions that applies to every message:

python
from aquin import AquinClient

client = AquinClient("aq-m_your_key_here")
res = client.complete(
    "What countries is this approved in?",
    system_prompt="You are a knowledgeable medical information specialist. Always recommend consulting a healthcare provider for personal medical decisions."
)

print(res.text)

Need async? There's a native async method too:

python
import asyncio
from aquin import AquinClient

async def main():
    client = AquinClient("aq-m_your_key_here")
    res = await client.acomplete(
        "Summarize this contract:",
        system_prompt="You are a legal assistant. Be precise and concise."
    )
    print(res.text)

asyncio.run(main())

call your model — javascript / typescript

typescript
import { AquinClient } from "aquin";

const client = new AquinClient("aq-m_your_key_here");
const res = await client.complete("What is LoRA fine-tuning?");

console.log(res.text);
console.log(`tokens in: ${res.tokens_in}, out: ${res.tokens_out}`);

Pass a system_prompt in the options object:

typescript
import { AquinClient } from "aquin";

const client = new AquinClient("aq-m_your_key_here");
const res = await client.complete("What countries is this approved in?", {
  system_prompt: "You are a knowledgeable medical information specialist. Always recommend consulting a healthcare provider for personal medical decisions.",
});

console.log(res.text);

Works in Node.js, Next.js, any modern JS runtime. The SDK uses the native fetch API — no extra dependencies.

or just use curl

No SDK needed. Every model endpoint is a plain HTTPS POST:

bash
curl -X POST https://www.aquin.app/api/infer \
  -H "Authorization: Bearer aq-m_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is LoRA?", "max_tokens": 256}'

With a system prompt:

bash
curl -X POST https://www.aquin.app/api/infer \
  -H "Authorization: Bearer aq-m_your_key_here" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "What is LoRA?",
    "messages": [
      { "role": "system", "content": "You are an expert ML researcher. Be concise." },
      { "role": "user", "content": "What is LoRA?" }
    ],
    "max_tokens": 256
  }'

Response:

json
{
  "text": "LoRA (Low-Rank Adaptation) is a fine-tuning technique...",
  "tokens_in": 6,
  "tokens_out": 42
}

parameters

Every completion call accepts these parameters:

  • prompt — the input text. required.
  • system_prompt — an optional instruction prepended as the first message. sets the model's persona or behavior for the entire conversation.
  • max_tokens — maximum tokens to generate. default 512.
  • temperature — controls randomness 0.0–1.0. default 0.7.

rate limits

Each API key has two independent rate limits, both configurable from the dashboard after key creation:

  • RPM — requests per minute. default 60, range 10–600. limits how many calls can be made in a rolling 60-second window.
  • TPM — tokens per minute. default 30K, range 5K–60K. limits total token throughput (input + output) in a rolling 60-second window.

When either limit is exceeded the API returns a 429 with a Retry-After: 60 header and a retry_after_ms field in the body. Both limits reset on a rolling 60-second basis — not on the clock minute.

json
// 429 response body
{
  "error": "Rate limit exceeded",
  "retry_after_ms": 60000
}

error handling

The SDK raises typed exceptions so you can handle specific failures without inspecting raw HTTP status codes:

python
from aquin import AquinClient
from aquin import InvalidKeyError, InsufficientCreditsError, RateLimitError, InferenceError

client = AquinClient("aq-m_your_key_here")

try:
    res = client.complete("Hello")
except InvalidKeyError:
    print("Key is invalid or has been revoked")
except InsufficientCreditsError:
    print("Top up your credits at aquin.app")
except RateLimitError as e:
    print(f"Rate limited — retry in {e.retry_after_ms}ms")
except InferenceError:
    print("Model inference failed — try again")

The full list of exceptions:

  • AquinError — base class for all errors
  • InvalidKeyError — key is wrong or revoked
  • InsufficientCreditsError — owner account is out of credits
  • RateLimitError — RPM or TPM limit hit; check e.retry_after_ms
  • ModelNotFoundError — model hasn't finished training
  • InferenceError — GPU server error

billing

API calls are charged to the model owner's credit balance based on inference time — the same credit system used for training. One credit equals one minute of compute. A typical API call takes 1–5 seconds, so costs are fractions of a credit per call.

You can monitor exactly how many calls each key has made and how many credits have been spent from the API Keys tab in your dashboard.

security

API keys are hashed with SHA-256 before storage. The raw key is never saved — not in your database, not in logs, not anywhere. If you lose it, generate a new one. Revoke keys instantly from the dashboard.

The inference VM is never directly exposed. Every request routes through Aquin's API layer where authentication, billing, and rate limiting are enforced before the model is ever touched.

what's next

Streaming responses, per-token billing, and a RAG API key for querying your training datasets directly are all coming. The SDK will stay in sync — version 1.x of the SDK will always talk to the v1 API.

pip install aquin. npm install aquin. build something.

Not sure if Aquin is right for you?

Aquin