Docs
Two routes. One header. Done.
The TES proxy speaks Anthropic Messages and OpenAI Chat Completions wire-compatibly. Point your existing client at it, pass X-TES-Token as a default header, and you're done. This page is the complete reference.
Quickstart
The integration footprint is two environment variables and one default header. Your existing call sites do not change.
# Pick one base URL for the upstream you call.
ANTHROPIC_BASE_URL=https://llm.api.pentatonic.com
OPENAI_BASE_URL=https://llm.api.pentatonic.com/v1
# Your existing upstream key — TES forwards it unchanged.
ANTHROPIC_API_KEY=sk-ant-...
# or:
OPENAI_API_KEY=sk-...
# Your TES token — issued at /signup.
TES_API_KEY=tes_<clientId>_<random>The Anthropic SDK appends /v1/messages to its base URL, so set ANTHROPIC_BASE_URL to https://llm.api.pentatonic.com (no path). The OpenAI SDK appends /chat/completions to its base URL, so set OPENAI_BASE_URL to https://llm.api.pentatonic.com/v1 (note the /v1).
Then call the model as normal
import Anthropic from "@anthropic-ai/sdk";
// Two env vars + one default header is the entire integration.
const client = new Anthropic({
baseURL: process.env.ANTHROPIC_BASE_URL, // https://llm.api.pentatonic.com
defaultHeaders: { "X-TES-Token": process.env.TES_API_KEY },
});
const response = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "Refactor the auth middleware" }],
});
// Same response shape. Token count drops because TES injected the relevant
// context as a system preamble before the model ran.Providers
The deployed proxy at llm.api.pentatonic.com exposes two routes today.
| Route | Forwards to | Use this when… |
|---|---|---|
| POST /v1/messages | api.anthropic.com | You're using the Anthropic SDK or anything that speaks Anthropic Messages. |
| POST /v1/chat/completions | api.openai.com | You're using the OpenAI SDK or any OpenAI-compatible client. |
| GET /health | — | Liveness check. Returns { ok: true }. |
Route
POST /v1/messagesForwards to
api.anthropic.comUse this when…
You're using the Anthropic SDK or anything that speaks Anthropic Messages.Route
POST /v1/chat/completionsForwards to
api.openai.comUse this when…
You're using the OpenAI SDK or any OpenAI-compatible client.Route
GET /healthForwards to
—Use this when…
Liveness check. Returns{ ok: true }.
Other upstreams (MiniMax, vLLM, llama.cpp, your own inference) are supported on the Enterprise tier as dedicated upstream routing — the worker forwards /v1/chat/completions to a configured base URL per tenant. Talk to us. For self-hosted setups you can clone the SDK and point the worker's ANTHROPIC_BASE_URL / OPENAI_BASE_URL env vars at any compatible endpoint.
Authentication
Two credentials per request: your TES tenant token, and your upstream provider key. The proxy never forwards your TES token to the upstream, and never forwards your upstream key to TES.
Heads-up: API keys only — not subscription auth
Anthropic and OpenAI bind their subscription OAuth tokens (the kind Claude Code or Cursor mints from a Claude Pro / Team / Enterprise subscription) to the originating client. Forwarding those through any third-party proxy will return a 401 at the upstream's edge — by their design, not ours. Use a workspace API key for proxy traffic. Subscription users who want memory injection in Claude Code should use the Claude Code plugin instead — it goes direct to the upstream and only routes the memory-search calls through TES.
X-TES-Token: tes_<clientId>_<random>
Identifies your tenant. Token state (active, revoked, rotated) is enforced by TES on every memory call — the proxy itself stores nothing. Get a token at /signup or rotate from your dashboard.
x-api-key: <your-anthropic-key>
Required when calling
/v1/messages. Forwarded toapi.anthropic.comunchanged. Rate limits, billing, and errors remain your direct relationship with Anthropic.Authorization: Bearer <your-openai-key>
Required when calling
/v1/chat/completions. Forwarded toapi.openai.comunchanged.
Request headers
| Header | Required | Value | Notes |
|---|---|---|---|
| X-TES-Token | Required | tes_<clientId>_<random> | Issued at /signup. Identifies your tenant. TES authoritatively validates and authorises every memory call — the proxy itself is stateless. |
| x-api-key | Required for /v1/messages | Your Anthropic key | Forwarded to api.anthropic.com unchanged. Rate limits, billing, and errors are still your direct relationship with Anthropic. |
| Authorization | Required for /v1/chat/completions | Bearer <openai-key> | Forwarded to api.openai.com unchanged. |
| anthropic-version | Optional | 2023-06-01 (default) | Whatever you set is forwarded to Anthropic. Pin a specific schema if your client requires it. |
| X-TES-Mode | Optional | passthrough | Skip retrieval entirely on this request. The proxy still emits a CHAT_TURN event so you have a comparison turn in your dashboard. Useful for A/B'ing the bill. |
Header
X-TES-TokenRequired
RequiredValue
tes_<clientId>_<random>Notes
Issued at /signup. Identifies your tenant. TES authoritatively validates and authorises every memory call — the proxy itself is stateless.Header
x-api-keyRequired
Required for /v1/messagesValue
Your Anthropic keyNotes
Forwarded to api.anthropic.com unchanged. Rate limits, billing, and errors are still your direct relationship with Anthropic.Header
AuthorizationRequired
Required for /v1/chat/completionsValue
Bearer <openai-key>Notes
Forwarded to api.openai.com unchanged.Header
anthropic-versionRequired
OptionalValue
2023-06-01 (default)Notes
Whatever you set is forwarded to Anthropic. Pin a specific schema if your client requires it.Header
X-TES-ModeRequired
OptionalValue
passthroughNotes
Skip retrieval entirely on this request. The proxy still emits a CHAT_TURN event so you have a comparison turn in your dashboard. Useful for A/B'ing the bill.
Response headers
In addition to whatever the upstream provider returns, the proxy attaches the following headers so you can audit what happened.
| Header | Value | Notes |
|---|---|---|
| X-TES-Provider | anthropic | openai | Which upstream the proxy routed to. Always set. |
| X-TES-Memories-Injected | <integer> | How many memories the proxy folded into the system preamble for this request. 0 if the retrieval found nothing useful or the request used passthrough mode. |
| X-TES-Skipped | <reason> | Present only when memory injection didn't happen. Reason values listed below. The upstream call itself is unaffected — your request still went through and you got a real model response. |
Header
X-TES-ProviderValue
anthropic | openaiNotes
Which upstream the proxy routed to. Always set.Header
X-TES-Memories-InjectedValue
<integer>Notes
How many memories the proxy folded into the system preamble for this request. 0 if the retrieval found nothing useful or the request used passthrough mode.Header
X-TES-SkippedValue
<reason>Notes
Present only when memory injection didn't happen. Reason values listed below. The upstream call itself is unaffected — your request still went through and you got a real model response.
X-TES-Skipped reasons
| Reason | When you'll see it |
|---|---|
| passthrough_mode | You sent X-TES-Mode: passthrough. |
| no_user_message | The request had no user message to retrieve against (e.g. system-only prompt). |
| tes_timeout | TES memory search exceeded the per-request timeout (currently 10s while the SDK ships an ANN-prefilter; target ceiling 800ms). |
| tes_unreachable | Network error reaching the TES tenant. |
| tes_http_<status> | TES returned a non-2xx HTTP status (e.g. tes_http_503). |
| tes_graphql:<reason> | TES returned a GraphQL error envelope. Reason is truncated. |
| tes_api_base_not_configured | Internal config error. Reach out — should never reach prod. |
Reason
passthrough_modeWhen you'll see it
You sent X-TES-Mode: passthrough.Reason
no_user_messageWhen you'll see it
The request had no user message to retrieve against (e.g. system-only prompt).Reason
tes_timeoutWhen you'll see it
TES memory search exceeded the per-request timeout (currently 10s while the SDK ships an ANN-prefilter; target ceiling 800ms).Reason
tes_unreachableWhen you'll see it
Network error reaching the TES tenant.Reason
tes_http_<status>When you'll see it
TES returned a non-2xx HTTP status (e.g. tes_http_503).Reason
tes_graphql:<reason>When you'll see it
TES returned a GraphQL error envelope. Reason is truncated.Reason
tes_api_base_not_configuredWhen you'll see it
Internal config error. Reach out — should never reach prod.
Errors
Errors raised by the proxy itself (auth, validation, routing) use this shape:
{
"error": {
"message": "missing_tes_token"
}
}Errors raised by the upstream provider (model not found, rate limit, invalid key) are forwarded with the upstream's status code and body unchanged. The proxy never rewrites the upstream's error envelope.
| Status | Message | Cause |
|---|---|---|
| 401 | missing_tes_token | X-TES-Token header was absent. |
| 401 | invalid_tes_token_format | Token didn't start with tes_. |
| 401 | invalid_tes_token_layout | Token didn't match tes_<clientId>_<random>. |
| 401 | invalid_client_id_in_token | ClientId portion contained characters outside [a-zA-Z0-9-]. |
| 400 | missing_x_api_key (Anthropic upstream key required) | Hitting /v1/messages without x-api-key. |
| 400 | missing_authorization_bearer (OpenAI upstream key required) | Hitting /v1/chat/completions without Authorization: Bearer … |
| 400 | Invalid JSON body | Request body wasn't parseable JSON. |
| 404 | Not Found | Path other than /v1/messages, /v1/chat/completions, or /health. |
| 405 | Method Not Allowed | GET on a POST route, or anything other than POST/GET. |
| 502 | Upstream provider unreachable: <message> | Network error reaching api.anthropic.com or api.openai.com. The upstream's own non-2xx is forwarded verbatim with the upstream's status. |
Status
401Message
missing_tes_tokenCause
X-TES-Token header was absent.Status
401Message
invalid_tes_token_formatCause
Token didn't start with tes_.Status
401Message
invalid_tes_token_layoutCause
Token didn't match tes_<clientId>_<random>.Status
401Message
invalid_client_id_in_tokenCause
ClientId portion contained characters outside [a-zA-Z0-9-].Status
400Message
missing_x_api_key (Anthropic upstream key required)Cause
Hitting /v1/messages without x-api-key.Status
400Message
missing_authorization_bearer (OpenAI upstream key required)Cause
Hitting /v1/chat/completions without Authorization: Bearer …Status
400Message
Invalid JSON bodyCause
Request body wasn't parseable JSON.Status
404Message
Not FoundCause
Path other than /v1/messages, /v1/chat/completions, or /health.Status
405Message
Method Not AllowedCause
GET on a POST route, or anything other than POST/GET.Status
502Message
Upstream provider unreachable: <message>Cause
Network error reaching api.anthropic.com or api.openai.com. The upstream's own non-2xx is forwarded verbatim with the upstream's status.
Passthrough mode
Send X-TES-Mode: passthrough on any request and the proxy will skip the retrieval step and forward your request to the upstream untouched. The CHAT_TURN event still fires so you have a comparison row in your dashboard.
curl https://llm.api.pentatonic.com/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "X-TES-Token: $TES_API_KEY" \
-H "X-TES-Mode: passthrough" \
-d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024,
"messages": [{"role":"user","content":"Hello"}] }'
# Response will include: X-TES-Skipped: passthrough_modeUseful for A/B'ing your invoice — flip half your traffic to passthrough for a day, compare the two columns in the dashboard.
Streaming
Both routes support SSE streaming. Set stream: true in the request body exactly as you would calling the upstream provider directly. The proxy forwards bytes immediately — there is no buffering on the response path.
const stream = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 1024,
messages: [{ role: "user", content: "Stream this" }],
stream: true,
});
for await (const event of stream) {
// identical to streaming directly from Anthropic
}The CHAT_TURN ingest fires once after the upstream stream closes (collected in parallel via a TransformStream), so it never blocks the response.
Limits & timeouts
Memory-search timeout
Currently 10s in production while the SDK ships an ANN-prefilter-then-rerank refactor — cold-tenant searches against live TES can hit 5–10s on tenants without an HNSW index. Target ceiling is 800ms once that lands. If TES doesn't respond in time the proxy sets
X-TES-Skipped: tes_timeoutand forwards your request unmodified — your call still goes through, just without the preamble.Retrieval shape
Top-6 memories with similarity ≥ 0.55 are folded into the system preamble. Tunables (limit, minScore) ship in the SDK; the deployed proxy uses these defaults.
Tier quotas
Free tier: 1M proxied input tokens / month. Pro: $0.50 per 1M proxied input tokens, $20/mo minimum. See /pricing for the full table.
Soft-fail contract
Any TES-side error is non-fatal to your request. The proxy sets
X-TES-Skippedwith a reason and forwards the upstream call as if TES weren't there. We never break your request because TES is unhealthy.
JS SDK — wrap your client, get memory injection
If you'd rather not change a base URL, the JS SDK can wrap your existing Anthropic or OpenAI client and do retrieval + preamble injection in-process before forwarding to the upstream. Same behaviour as the proxy — same retrieval, same injection format, same fail-soft semantics — just running inside your app instead of at our edge.
Memory injection is on by default from SDK 0.5.8 — you don't write any memory code.
import Anthropic from "@anthropic-ai/sdk";
import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
const tes = new TESClient({
endpoint: "https://your-co.api.pentatonic.com",
clientId: "your-co",
apiKey: process.env.TES_API_KEY,
});
// Wrap your existing client. Memory retrieval + injection happen
// automatically before each .messages.create call.
const anthropic = tes.wrap(new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }));
const res = await anthropic.messages.create({
model: "claude-sonnet-4-5",
messages: [{ role: "user", content: "What did we decide about JWT expiry?" }],
});
// res is the unmodified Anthropic response shape — your code is unchanged.Disable per-call with tes.wrap(client, { memory: false }). Override search knobs with memoryOpts: { limit, minScore, timeoutMs } — defaults match the proxy (6 results, 0.55 min score, 800ms timeout).
Why three integration paths if they all do the same thing?
The retrieval + injection logic lives in one shared module (@pentatonic-ai/ai-agent-sdk/memory), so the proxy, this SDK wrapper, and the OpenClaw plugin all produce the same preamble for the same query. Pick the path that fits your auth model — they're functionally identical.
Self-hosted SDK
If you don't want to use the hosted proxy, the same retrieval logic ships in the open-source SDK. Run a local memory stack (PostgreSQL + pgvector + Ollama in Docker), and your agent gets persistent memory and the same auto-retrieval behaviour without ever calling out to Pentatonic.
# Set up local memory — PostgreSQL + pgvector + Ollama in Docker.
# Pulls embedding + chat models, writes config, no account required.
npx @pentatonic-ai/ai-agent-sdk memoryOr use the SDK directly to wrap your existing OpenAI / Anthropic / Workers AI client and emit observability events without any proxy in the loop. See /sdk for the wrap API and provider matrix.
Claude Code plugin
If you use Claude Code, install the plugin to get persistent memory and automatic session capture without touching any code. The plugin works against either the hosted TES platform or a local memory stack you run yourself.
/plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
/plugin install tes-memory@pentatonic-ai
/tes-memory:tes-setupSetup writes config to ~/.claude/tes-memory.local.md (or ~/.claude-pentatonic/tes-memory.local.md on aliased installs). Hooks fire on SessionStart, UserPromptSubmit, PostToolUse, Stop, and SessionEnd — every turn is searchable. Full reference at /sdk/claude-code.
FAQ
Does TES change my response shape?
No. The proxy returns the upstream provider's bytes unmodified. Buffered responses are forwarded as a single body; streaming responses are forwarded chunk-by-chunk via SSE with no reordering.
What happens if TES is unhealthy?
Your request is forwarded to the upstream provider unmodified, with X-TES-Skipped describing the reason. We never break a customer request because TES is down — you get the upstream behaviour you were already paying for.
Where does the injected context come from?
From your tenant's memory layer. The proxy calls semanticSearchMemories on TES with the last user message, and folds the top hits into the system preamble. Cold-start tenants get nothing useful injected until you've populated the memory — see the SDK below for ways to feed memory in.
Can I use the proxy with MiniMax, vLLM, or another OpenAI-compatible upstream?
Today the deployed proxy at llm.api.pentatonic.com forwards /v1/messages to api.anthropic.com and /v1/chat/completions to api.openai.com. Other upstreams are on the Enterprise tier as a dedicated routing config — talk to us. The proxy is OSS, so for self-hosted use you can point ANTHROPIC_BASE_URL / OPENAI_BASE_URL env vars at any compatible endpoint.
Does the proxy log my prompts?
The last user message is sent to TES for the memory search. The full request body is forwarded to your upstream — TES doesn't store the upstream response body in plain form. Ingest writes a CHAT_TURN event for analytics with the user message and assistant text against your tenant.
Can I rotate my TES token?
Yes — from your dashboard. Old tokens are invalidated immediately because TES is the sole authority on token state, not the proxy.
Why retrieve and inject instead of caching?
Prompt caching only helps when the same exact prefix repeats. Retrieval-then-inject works on every turn — including the first turn of a new session — because it reaches into your durable memory layer rather than relying on prefix locality.
Can I use my Claude Pro / Team / Cursor / Codex subscription key?
No — and the failure is on the upstream's side, not ours. Anthropic and OpenAI bind subscription OAuth tokens to the device that minted them. Routing them through any third-party proxy returns a 401 at the upstream's edge. Use a workspace API key, or install the Claude Code plugin which goes direct to the upstream and only sends the memory search to TES.
Ship it
Two env vars away from a smaller bill.
Free tier covers 1M proxied input tokens per month. No credit card.