MITv0.4.xJS + PythonRuns locally

The library behind the proxy.
Wrap, observe, retrieve.

If the env-var-swap proxy isn't enough — you need fine-grained retrieval control, you want to run the memory layer locally, or you're building observability on top of an LLM client the proxy doesn't speak — install the SDK directly. Same retrieval shape, in your process.

Just want a smaller bill? Skip the SDK and use the proxy — two env vars, one default header.

Quick start — local mode, no account needed
import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
import Anthropic from "@anthropic-ai/sdk";

// Local mode — no API key. Memory stack runs on your machine.
const tes = new TESClient();

// One line — wrap your existing client.
const ai = tes.wrap(new Anthropic());

const res = await ai.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }],
});

// Captured locally:
//   CHAT_TURN { tokens, model, ... }
//   TOOL_USE { name, args, result }
//   Session linkage across turns
//   4-layer memory: episodic, semantic, procedural, working

Install

One package, four entry points

npm / pip

Library install. Wrap your existing OpenAI / Anthropic / Workers AI client, emit observability events, manage sessions manually if you need to.

npx … init

Interactive setup for the hosted TES platform. Creates a tenant, verifies your email, mints an API key, and prints the three env vars you need.

npx … memory

Runs the full memory stack locally — PostgreSQL + pgvector + Ollama in Docker. No account, no API key, no egress. Pulls embedding + chat models, writes config, and you're done.

npm install @pentatonic-ai/ai-agent-sdk

Two modes

Local for privacy. Hosted for scale.

The same SDK runs against a local Docker stack or against the hosted TES platform. Switch by changing config — the call sites don't change.

Local memory

Free forever

PostgreSQL + pgvector + Ollama on your own machine. Pi 5 with 8GB RAM is enough for the default models.

  • 4-layer memory — episodic, semantic, procedural, working
  • Multi-signal retrieval (vector + BM25 + recency + frequency)
  • HyDE query expansion
  • Decay and consolidation cycles
  • Wrap any LLM client — OpenAI, Anthropic, Workers AI
  • Token usage, tool calls, sessions
  • MIT licensed, no telemetry, no egress
View on GitHub

Hosted TES platform

Per token

Same SDK pointed at the hosted platform. Higher-dimensional embeddings, conversation analytics, team-wide shared memory. Same per-token billing as the proxy.

  • Everything in local, plus:
  • NV-Embed-v2 (4096-dim) embeddings + HNSW index
  • Conversation analytics + dead-end detection
  • Team-wide shared memory + admin dashboard
  • Multi-tenancy with per-client data residency
  • Same token surface as the proxy — no parallel keys
Get an API key

Local — no config

// No API key needed — runs locally.
const tes = new TESClient();
const ai = tes.wrap(new OpenAI());

Hosted — three env vars

const tes = new TESClient({
  endpoint: process.env.TES_ENDPOINT,
  clientId: process.env.TES_CLIENT_ID,
  apiKey: process.env.TES_API_KEY,
});
const ai = tes.wrap(new OpenAI());  // same call sites

Provider agnostic

Three providers, one wrap

tes.wrap() detects the client type and intercepts the right method. Switch providers without changing tracking code.

JavaScript / TypeScript

import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
import OpenAI from "openai";

const tes = new TESClient({
  endpoint: process.env.TES_ENDPOINT,    // https://<clientId>.api.pentatonic.com
  clientId: process.env.TES_CLIENT_ID,
  apiKey: process.env.TES_API_KEY,
});

const ai = tes.wrap(new OpenAI(), { sessionId: "conv-123" });

const res = await ai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});
// CHAT_TURN event emitted in the background — token usage, tool calls,
// session linkage — without changing the call site.

Python

import os
from pentatonic_ai_agent_sdk import TESClient
from openai import OpenAI

tes = TESClient(
    endpoint=os.environ["TES_ENDPOINT"],
    client_id=os.environ["TES_CLIENT_ID"],
    api_key=os.environ["TES_API_KEY"],
)

ai = tes.wrap(OpenAI(), session_id="conv-123")

res = ai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
ProviderDetectionIntercepted methodAvailable in
OpenAIclient.chat.completions.createchat.completions.create()JS + Python
Anthropicclient.messages.createmessages.create()JS + Python
Workers AIclient.runrun()JS only

All other methods on the wrapped client pass through unchanged.

Two patterns

Auto-wrap or manual sessions

tes.wrap() for zero boilerplate. tes.session() for full control over multi-turn loops. Both emit the same CHAT_TURN events.

Recommended

Auto-wrap

Pass your client to tes.wrap(). Tracking happens automatically — no changes to your existing call sites.

const tes = new TESClient({ endpoint, clientId, apiKey });
const ai = tes.wrap(new OpenAI());

const res = await ai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});
// CHAT_TURN emitted in the background.
Full control

Manual session

Create a session for multi-round tool-calling loops where you want to attach metadata to individual turns.

const session = tes.session({
  sessionId: "conv-123",
  metadata: { userId: "u_1", channel: "api" },
});

session.record(rawLLMResponse);

await session.emitChatTurn({
  userMessage: "Find items under $50",
  assistantResponse: res.content[0].text,
});

console.log(session.totalUsage);
// { prompt_tokens: 142, completion_tokens: 87, ai_rounds: 3 }

Reference

Public surface

Zero runtime dependencies. Full source on GitHub.

new TESClient({ endpoint, clientId, apiKey, ... })

Create a client. endpoint/clientId/apiKey are required for hosted mode. Optional: userId, captureContent, maxContentLength, waitUntil.

tes.wrap(client, { sessionId?, metadata? })

Return a transparent proxy around your LLM client. Detects provider, intercepts the right method, emits CHAT_TURN events without changing your code.

tes.session({ sessionId?, metadata? })

Create a Session manually for full control over multi-round tool-calling loops.

session.record(rawResponse)

Normalise a raw provider response and accumulate token usage on the session.

session.emitChatTurn({ userMessage, assistantResponse, ... })

Emit a CHAT_TURN event with accumulated data, then reset for the next turn.

session.emitToolUse({ name, args, result? })

Emit a TOOL_USE event linked to the current session.

normalizeResponse(raw)

Standalone utility — converts any provider response into { content, model, usage, toolCalls }.

createMemorySystem({ db, embedding, llm })

Library entry point — run the local 4-layer memory stack against your own pg + Ollama. Imported from @pentatonic-ai/ai-agent-sdk/memory.

Open source

Start local. Scale with TES.

Install free, run locally on your own infra. When you need higher-dimensional embeddings, conversation analytics, and team-shared memory — point the same client at the hosted platform.