Docs

Two routes. One header. Done.

Name: Thing Event System (TES)
Author: Pentatonic

The TES proxy speaks Anthropic Messages and OpenAI Chat Completions wire-compatibly. Point your existing client at it, pass X-TES-Token as a default header, and you're done. This page is the complete reference.

Get API key Jump to quickstart

Quickstart

The integration footprint is two environment variables and one default header. Your existing call sites do not change.

.env

# Pick one base URL for the upstream you call.
ANTHROPIC_BASE_URL=https://llm.api.pentatonic.com
OPENAI_BASE_URL=https://llm.api.pentatonic.com/v1

# Your existing upstream key — TES forwards it unchanged.
ANTHROPIC_API_KEY=sk-ant-...
# or:
OPENAI_API_KEY=sk-...

# Your TES token — issued at /signup.
TES_API_KEY=tes_<clientId>_<random>

The Anthropic SDK appends /v1/messages to its base URL, so set ANTHROPIC_BASE_URL to https://llm.api.pentatonic.com (no path). The OpenAI SDK appends /chat/completions to its base URL, so set OPENAI_BASE_URL to https://llm.api.pentatonic.com/v1 (note the /v1).

Then call the model as normal

import Anthropic from "@anthropic-ai/sdk";

// Two env vars + one default header is the entire integration.
const client = new Anthropic({
  baseURL: process.env.ANTHROPIC_BASE_URL,        // https://llm.api.pentatonic.com
  defaultHeaders: { "X-TES-Token": process.env.TES_API_KEY },
});

const response = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Refactor the auth middleware" }],
});

// Same response shape. Token count drops because TES injected the relevant
// context as a system preamble before the model ran.

Providers

The deployed proxy at llm.api.pentatonic.com exposes two routes today.

Route	Forwards to	Use this when…
POST /v1/messages	api.anthropic.com	You're using the Anthropic SDK or anything that speaks Anthropic Messages.
POST /v1/chat/completions	api.openai.com	You're using the OpenAI SDK or any OpenAI-compatible client.
GET /health	—	Liveness check. Returns `{ ok: true }`.

Route
POST /v1/messages
Forwards to
api.anthropic.com
Use this when…
You're using the Anthropic SDK or anything that speaks Anthropic Messages.
Route
POST /v1/chat/completions
Forwards to
api.openai.com
Use this when…
You're using the OpenAI SDK or any OpenAI-compatible client.
Route
GET /health
Forwards to
—
Use this when…
Liveness check. Returns { ok: true }.

Other upstreams (MiniMax, vLLM, llama.cpp, your own inference) are supported on the Enterprise tier as dedicated upstream routing — the worker forwards /v1/chat/completions to a configured base URL per tenant. Talk to us. For self-hosted setups you can clone the SDK and point the worker's ANTHROPIC_BASE_URL / OPENAI_BASE_URL env vars at any compatible endpoint.

Authentication

Two credentials per request: your TES tenant token, and your upstream provider key. The proxy never forwards your TES token to the upstream, and never forwards your upstream key to TES.

Heads-up: API keys only — not subscription auth

Anthropic and OpenAI bind their subscription OAuth tokens (the kind Claude Code or Cursor mints from a Claude Pro / Team / Enterprise subscription) to the originating client. Forwarding those through any third-party proxy will return a 401 at the upstream's edge — by their design, not ours. Use a workspace API key for proxy traffic. Subscription users who want memory injection in Claude Code should use the Claude Code plugin instead — it goes direct to the upstream and only routes the memory-search calls through TES.

X-TES-Token: tes_<clientId>_<random>
Identifies your tenant. Token state (active, revoked, rotated) is enforced by TES on every memory call — the proxy itself stores nothing. Get a token at /signup or rotate from your dashboard.
x-api-key: <your-anthropic-key>
Required when calling /v1/messages. Forwarded to api.anthropic.com unchanged. Rate limits, billing, and errors remain your direct relationship with Anthropic.
Authorization: Bearer <your-openai-key>
Required when calling /v1/chat/completions. Forwarded to api.openai.com unchanged.

Request headers

Header	Required	Value	Notes
X-TES-Token	Required	tes_<clientId>_<random>	Issued at /signup. Identifies your tenant. TES authoritatively validates and authorises every memory call — the proxy itself is stateless.
x-api-key	Required for /v1/messages	Your Anthropic key	Forwarded to api.anthropic.com unchanged. Rate limits, billing, and errors are still your direct relationship with Anthropic.
Authorization	Required for /v1/chat/completions	Bearer <openai-key>	Forwarded to api.openai.com unchanged.
anthropic-version	Optional	2023-06-01 (default)	Whatever you set is forwarded to Anthropic. Pin a specific schema if your client requires it.
X-TES-Mode	Optional	passthrough	Skip retrieval entirely on this request. The proxy still emits a CHAT_TURN event so you have a comparison turn in your dashboard. Useful for A/B'ing the bill.

Header
X-TES-Token
Required
Required
Value
tes_<clientId>_<random>
Notes
Issued at /signup. Identifies your tenant. TES authoritatively validates and authorises every memory call — the proxy itself is stateless.
Header
x-api-key
Required
Required for /v1/messages
Value
Your Anthropic key
Notes
Forwarded to api.anthropic.com unchanged. Rate limits, billing, and errors are still your direct relationship with Anthropic.
Header
Authorization
Required
Required for /v1/chat/completions
Value
Bearer <openai-key>
Notes
Forwarded to api.openai.com unchanged.
Header
anthropic-version
Required
Optional
Value
2023-06-01 (default)
Notes
Whatever you set is forwarded to Anthropic. Pin a specific schema if your client requires it.
Header
X-TES-Mode
Required
Optional
Value
passthrough
Notes
Skip retrieval entirely on this request. The proxy still emits a CHAT_TURN event so you have a comparison turn in your dashboard. Useful for A/B'ing the bill.

Response headers

In addition to whatever the upstream provider returns, the proxy attaches the following headers so you can audit what happened.

Header	Value	Notes
X-TES-Provider	anthropic \| openai	Which upstream the proxy routed to. Always set.
X-TES-Memories-Injected	<integer>	How many memories the proxy folded into the system preamble for this request. 0 if the retrieval found nothing useful or the request used passthrough mode.
X-TES-Skipped	<reason>	Present only when memory injection didn't happen. Reason values listed below. The upstream call itself is unaffected — your request still went through and you got a real model response.

Header
X-TES-Provider
Value
anthropic | openai
Notes
Which upstream the proxy routed to. Always set.
Header
X-TES-Memories-Injected
Value
<integer>
Notes
How many memories the proxy folded into the system preamble for this request. 0 if the retrieval found nothing useful or the request used passthrough mode.
Header
X-TES-Skipped
Value
<reason>
Notes
Present only when memory injection didn't happen. Reason values listed below. The upstream call itself is unaffected — your request still went through and you got a real model response.

X-TES-Skipped reasons

Reason	When you'll see it
passthrough_mode	You sent X-TES-Mode: passthrough.
no_user_message	The request had no user message to retrieve against (e.g. system-only prompt).
tes_timeout	TES memory search exceeded the per-request timeout (currently 10s while the SDK ships an ANN-prefilter; target ceiling 800ms).
tes_unreachable	Network error reaching the TES tenant.
tes_http_<status>	TES returned a non-2xx HTTP status (e.g. tes_http_503).
tes_graphql:<reason>	TES returned a GraphQL error envelope. Reason is truncated.
tes_api_base_not_configured	Internal config error. Reach out — should never reach prod.

Reason
passthrough_mode
When you'll see it
You sent X-TES-Mode: passthrough.
Reason
no_user_message
When you'll see it
The request had no user message to retrieve against (e.g. system-only prompt).
Reason
tes_timeout
When you'll see it
TES memory search exceeded the per-request timeout (currently 10s while the SDK ships an ANN-prefilter; target ceiling 800ms).
Reason
tes_unreachable
When you'll see it
Network error reaching the TES tenant.
Reason
tes_http_<status>
When you'll see it
TES returned a non-2xx HTTP status (e.g. tes_http_503).
Reason
tes_graphql:<reason>
When you'll see it
TES returned a GraphQL error envelope. Reason is truncated.
Reason
tes_api_base_not_configured
When you'll see it
Internal config error. Reach out — should never reach prod.

Errors

Errors raised by the proxy itself (auth, validation, routing) use this shape:

{
  "error": {
    "message": "missing_tes_token"
  }
}

Errors raised by the upstream provider (model not found, rate limit, invalid key) are forwarded with the upstream's status code and body unchanged. The proxy never rewrites the upstream's error envelope.

Status	Message	Cause
401	missing_tes_token	X-TES-Token header was absent.
401	invalid_tes_token_format	Token didn't start with tes_.
401	invalid_tes_token_layout	Token didn't match tes_<clientId>_<random>.
401	invalid_client_id_in_token	ClientId portion contained characters outside [a-zA-Z0-9-].
400	missing_x_api_key (Anthropic upstream key required)	Hitting /v1/messages without x-api-key.
400	missing_authorization_bearer (OpenAI upstream key required)	Hitting /v1/chat/completions without Authorization: Bearer …
400	Invalid JSON body	Request body wasn't parseable JSON.
404	Not Found	Path other than /v1/messages, /v1/chat/completions, or /health.
405	Method Not Allowed	GET on a POST route, or anything other than POST/GET.
502	Upstream provider unreachable: <message>	Network error reaching api.anthropic.com or api.openai.com. The upstream's own non-2xx is forwarded verbatim with the upstream's status.

Status
401
Message
missing_tes_token
Cause
X-TES-Token header was absent.
Status
401
Message
invalid_tes_token_format
Cause
Token didn't start with tes_.
Status
401
Message
invalid_tes_token_layout
Cause
Token didn't match tes_<clientId>_<random>.
Status
401
Message
invalid_client_id_in_token
Cause
ClientId portion contained characters outside [a-zA-Z0-9-].
Status
400
Message
missing_x_api_key (Anthropic upstream key required)
Cause
Hitting /v1/messages without x-api-key.
Status
400
Message
missing_authorization_bearer (OpenAI upstream key required)
Cause
Hitting /v1/chat/completions without Authorization: Bearer …
Status
400
Message
Invalid JSON body
Cause
Request body wasn't parseable JSON.
Status
404
Message
Not Found
Cause
Path other than /v1/messages, /v1/chat/completions, or /health.
Status
405
Message
Method Not Allowed
Cause
GET on a POST route, or anything other than POST/GET.
Status
502
Message
Upstream provider unreachable: <message>
Cause
Network error reaching api.anthropic.com or api.openai.com. The upstream's own non-2xx is forwarded verbatim with the upstream's status.

Passthrough mode

Send X-TES-Mode: passthrough on any request and the proxy will skip the retrieval step and forward your request to the upstream untouched. The CHAT_TURN event still fires so you have a comparison row in your dashboard.

curl https://llm.api.pentatonic.com/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "X-TES-Token: $TES_API_KEY" \
  -H "X-TES-Mode: passthrough" \
  -d '{ "model": "claude-sonnet-4-5", "max_tokens": 1024,
        "messages": [{"role":"user","content":"Hello"}] }'

# Response will include: X-TES-Skipped: passthrough_mode

Useful for A/B'ing your invoice — flip half your traffic to passthrough for a day, compare the two columns in the dashboard.

Streaming

Both routes support SSE streaming. Set stream: true in the request body exactly as you would calling the upstream provider directly. The proxy forwards bytes immediately — there is no buffering on the response path.

const stream = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Stream this" }],
  stream: true,
});

for await (const event of stream) {
  // identical to streaming directly from Anthropic
}

The CHAT_TURN ingest fires once after the upstream stream closes (collected in parallel via a TransformStream), so it never blocks the response.

Limits & timeouts

Memory-search timeout
Currently 10s in production while the SDK ships an ANN-prefilter-then-rerank refactor — cold-tenant searches against live TES can hit 5–10s on tenants without an HNSW index. Target ceiling is 800ms once that lands. If TES doesn't respond in time the proxy sets X-TES-Skipped: tes_timeout and forwards your request unmodified — your call still goes through, just without the preamble.
Retrieval shape
Top-6 memories with similarity ≥ 0.55 are folded into the system preamble. Tunables (limit, minScore) ship in the SDK; the deployed proxy uses these defaults.
Tier quotas
Free tier: 1M proxied input tokens / month. Pro: $0.50 per 1M proxied input tokens, $20/mo minimum. See /pricing for the full table.
Soft-fail contract
Any TES-side error is non-fatal to your request. The proxy sets X-TES-Skipped with a reason and forwards the upstream call as if TES weren't there. We never break your request because TES is unhealthy.

JS SDK — wrap your client, get memory injection

If you'd rather not change a base URL, the JS SDK can wrap your existing Anthropic or OpenAI client and do retrieval + preamble injection in-process before forwarding to the upstream. Same behaviour as the proxy — same retrieval, same injection format, same fail-soft semantics — just running inside your app instead of at our edge.

Memory injection is on by default from SDK 0.5.8 — you don't write any memory code.

Wrap an Anthropic client

import Anthropic from "@anthropic-ai/sdk";
import { TESClient } from "@pentatonic-ai/ai-agent-sdk";

const tes = new TESClient({
  endpoint: "https://your-co.api.pentatonic.com",
  clientId: "your-co",
  apiKey:   process.env.TES_API_KEY,
});

// Wrap your existing client. Memory retrieval + injection happen
// automatically before each .messages.create call.
const anthropic = tes.wrap(new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY }));

const res = await anthropic.messages.create({
  model: "claude-sonnet-4-5",
  messages: [{ role: "user", content: "What did we decide about JWT expiry?" }],
});

// res is the unmodified Anthropic response shape — your code is unchanged.

Disable per-call with tes.wrap(client, { memory: false }). Override search knobs with memoryOpts: { limit, minScore, timeoutMs } — defaults match the proxy (6 results, 0.55 min score, 800ms timeout).

Why three integration paths if they all do the same thing?

The retrieval + injection logic lives in one shared module (@pentatonic-ai/ai-agent-sdk/memory), so the proxy, this SDK wrapper, and the OpenClaw plugin all produce the same preamble for the same query. Pick the path that fits your auth model — they're functionally identical.

Self-hosted SDK

If you don't want to use the hosted proxy, the same retrieval logic ships in the open-source SDK. Run a local memory stack (PostgreSQL + pgvector + Ollama in Docker), and your agent gets persistent memory and the same auto-retrieval behaviour without ever calling out to Pentatonic.

Local memory stack

# Set up local memory — PostgreSQL + pgvector + Ollama in Docker.
# Pulls embedding + chat models, writes config, no account required.
npx @pentatonic-ai/ai-agent-sdk memory

Or use the SDK directly to wrap your existing OpenAI / Anthropic / Workers AI client and emit observability events without any proxy in the loop. See /sdk for the wrap API and provider matrix.

npm

@pentatonic-ai/ai-agent-sdk

JavaScript / TypeScript SDK

PyPI

pentatonic-ai-agent-sdk

Python SDK

Claude Code plugin

If you use Claude Code, install the plugin to get persistent memory and automatic session capture without touching any code. The plugin works against either the hosted TES platform or a local memory stack you run yourself.

In Claude Code

/plugin marketplace add Pentatonic-Ltd/ai-agent-sdk
/plugin install tes-memory@pentatonic-ai
/tes-memory:tes-setup

Setup writes config to ~/.claude/tes-memory.local.md (or ~/.claude-pentatonic/tes-memory.local.md on aliased installs). Hooks fire on SessionStart, UserPromptSubmit, PostToolUse, Stop, and SessionEnd — every turn is searchable. Full reference at /sdk/claude-code.

FAQ

Does TES change my response shape?
No. The proxy returns the upstream provider's bytes unmodified. Buffered responses are forwarded as a single body; streaming responses are forwarded chunk-by-chunk via SSE with no reordering.
What happens if TES is unhealthy?
Your request is forwarded to the upstream provider unmodified, with X-TES-Skipped describing the reason. We never break a customer request because TES is down — you get the upstream behaviour you were already paying for.
Where does the injected context come from?
From your tenant's memory layer. The proxy calls semanticSearchMemories on TES with the last user message, and folds the top hits into the system preamble. Cold-start tenants get nothing useful injected until you've populated the memory — see the SDK below for ways to feed memory in.
Can I use the proxy with MiniMax, vLLM, or another OpenAI-compatible upstream?
Today the deployed proxy at llm.api.pentatonic.com forwards /v1/messages to api.anthropic.com and /v1/chat/completions to api.openai.com. Other upstreams are on the Enterprise tier as a dedicated routing config — talk to us. The proxy is OSS, so for self-hosted use you can point ANTHROPIC_BASE_URL / OPENAI_BASE_URL env vars at any compatible endpoint.
Does the proxy log my prompts?
The last user message is sent to TES for the memory search. The full request body is forwarded to your upstream — TES doesn't store the upstream response body in plain form. Ingest writes a CHAT_TURN event for analytics with the user message and assistant text against your tenant.
Can I rotate my TES token?
Yes — from your dashboard. Old tokens are invalidated immediately because TES is the sole authority on token state, not the proxy.
Why retrieve and inject instead of caching?
Prompt caching only helps when the same exact prefix repeats. Retrieval-then-inject works on every turn — including the first turn of a new session — because it reaches into your durable memory layer rather than relying on prefix locality.
Can I use my Claude Pro / Team / Cursor / Codex subscription key?
No — and the failure is on the upstream's side, not ours. Anthropic and OpenAI bind subscription OAuth tokens to the device that minted them. Routing them through any third-party proxy returns a 401 at the upstream's edge. Use a workspace API key, or install the Claude Code plugin which goes direct to the upstream and only sends the memory search to TES.

Ship it

Two env vars away from a smaller bill.

Free tier covers 1M proxied input tokens per month. No credit card.

Get API key See benchmarks