You trained a model. Now what? Most platforms stop there — you get a model file and a bill. Aquin goes further. Every trained model gets a live inference endpoint and an API key system, so you can actually use what you built.
install in seconds
The SDK is available on both PyPI and npm. Install whichever fits your stack:
pip install aquinnpm install aquingenerate your api key
Once you have a completed training run, go to the API Keys tab in your Aquin dashboard. Select the model you want to expose, hit Generate, and copy the key immediately — it's shown exactly once and never stored in plaintext.
Keys look like this: aq-m_xxxxxxxxxxxxxxxx
You can generate multiple keys per model, revoke any of them at any time, and track usage per key — calls made and credits spent. Each key has its own independent TPM and RPM limits you can configure from the dashboard.
call your model — python
Three lines. Initialize the client, call complete, get a response:
from aquin import AquinClient
client = AquinClient("aq-m_your_key_here")
res = client.complete("What is LoRA fine-tuning?")
print(res.text)
print(f"tokens in: {res.tokens_in}, out: {res.tokens_out}")Use system_prompt to give your model a persona or set of instructions that applies to every message:
from aquin import AquinClient
client = AquinClient("aq-m_your_key_here")
res = client.complete(
"What countries is this approved in?",
system_prompt="You are a knowledgeable medical information specialist. Always recommend consulting a healthcare provider for personal medical decisions."
)
print(res.text)Need async? There's a native async method too:
import asyncio
from aquin import AquinClient
async def main():
client = AquinClient("aq-m_your_key_here")
res = await client.acomplete(
"Summarize this contract:",
system_prompt="You are a legal assistant. Be precise and concise."
)
print(res.text)
asyncio.run(main())call your model — javascript / typescript
import { AquinClient } from "aquin";
const client = new AquinClient("aq-m_your_key_here");
const res = await client.complete("What is LoRA fine-tuning?");
console.log(res.text);
console.log(`tokens in: ${res.tokens_in}, out: ${res.tokens_out}`);Pass a system_prompt in the options object:
import { AquinClient } from "aquin";
const client = new AquinClient("aq-m_your_key_here");
const res = await client.complete("What countries is this approved in?", {
system_prompt: "You are a knowledgeable medical information specialist. Always recommend consulting a healthcare provider for personal medical decisions.",
});
console.log(res.text);Works in Node.js, Next.js, any modern JS runtime. The SDK uses the native fetch API — no extra dependencies.
or just use curl
No SDK needed. Every model endpoint is a plain HTTPS POST:
curl -X POST https://www.aquin.app/api/infer \
-H "Authorization: Bearer aq-m_your_key_here" \
-H "Content-Type: application/json" \
-d '{"prompt": "What is LoRA?", "max_tokens": 256}'With a system prompt:
curl -X POST https://www.aquin.app/api/infer \
-H "Authorization: Bearer aq-m_your_key_here" \
-H "Content-Type: application/json" \
-d '{
"prompt": "What is LoRA?",
"messages": [
{ "role": "system", "content": "You are an expert ML researcher. Be concise." },
{ "role": "user", "content": "What is LoRA?" }
],
"max_tokens": 256
}'Response:
{
"text": "LoRA (Low-Rank Adaptation) is a fine-tuning technique...",
"tokens_in": 6,
"tokens_out": 42
}parameters
Every completion call accepts these parameters:
- —prompt — the input text. required.
- —system_prompt — an optional instruction prepended as the first message. sets the model's persona or behavior for the entire conversation.
- —max_tokens — maximum tokens to generate. default 512.
- —temperature — controls randomness 0.0–1.0. default 0.7.
rate limits
Each API key has two independent rate limits, both configurable from the dashboard after key creation:
- —RPM — requests per minute. default 60, range 10–600. limits how many calls can be made in a rolling 60-second window.
- —TPM — tokens per minute. default 30K, range 5K–60K. limits total token throughput (input + output) in a rolling 60-second window.
When either limit is exceeded the API returns a 429 with a Retry-After: 60 header and a retry_after_ms field in the body. Both limits reset on a rolling 60-second basis — not on the clock minute.
// 429 response body
{
"error": "Rate limit exceeded",
"retry_after_ms": 60000
}error handling
The SDK raises typed exceptions so you can handle specific failures without inspecting raw HTTP status codes:
from aquin import AquinClient
from aquin import InvalidKeyError, InsufficientCreditsError, RateLimitError, InferenceError
client = AquinClient("aq-m_your_key_here")
try:
res = client.complete("Hello")
except InvalidKeyError:
print("Key is invalid or has been revoked")
except InsufficientCreditsError:
print("Top up your credits at aquin.app")
except RateLimitError as e:
print(f"Rate limited — retry in {e.retry_after_ms}ms")
except InferenceError:
print("Model inference failed — try again")The full list of exceptions:
- —
AquinError— base class for all errors - —
InvalidKeyError— key is wrong or revoked - —
InsufficientCreditsError— owner account is out of credits - —
RateLimitError— RPM or TPM limit hit; checke.retry_after_ms - —
ModelNotFoundError— model hasn't finished training - —
InferenceError— GPU server error
billing
API calls are charged to the model owner's credit balance based on inference time — the same credit system used for training. One credit equals one minute of compute. A typical API call takes 1–5 seconds, so costs are fractions of a credit per call.
You can monitor exactly how many calls each key has made and how many credits have been spent from the API Keys tab in your dashboard.
security
API keys are hashed with SHA-256 before storage. The raw key is never saved — not in your database, not in logs, not anywhere. If you lose it, generate a new one. Revoke keys instantly from the dashboard.
The inference VM is never directly exposed. Every request routes through Aquin's API layer where authentication, billing, and rate limiting are enforced before the model is ever touched.
what's next
Streaming responses, per-token billing, and a RAG API key for querying your training datasets directly are all coming. The SDK will stay in sync — version 1.x of the SDK will always talk to the v1 API.
pip install aquin. npm install aquin. build something.
