MITv0.4.xJS + PythonRuns locally

The library behind the proxy.
Wrap, observe, retrieve.

Name: Thing Event System (TES)
Author: Pentatonic

If the env-var-swap proxy isn't enough — you need fine-grained retrieval control, you want to run the memory layer locally, or you're building observability on top of an LLM client the proxy doesn't speak — install the SDK directly. Same retrieval shape, in your process.

View on GitHub Get a TES API key

Just want a smaller bill? Skip the SDK and use the proxy — two env vars, one default header.

Quick start — local mode, no account needed

import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
import Anthropic from "@anthropic-ai/sdk";

// Local mode — no API key. Memory stack runs on your machine.
const tes = new TESClient();

// One line — wrap your existing client.
const ai = tes.wrap(new Anthropic());

const res = await ai.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 1024,
  messages: [{ role: "user", content: "Hello" }],
});

// Captured locally:
//   CHAT_TURN { tokens, model, ... }
//   TOOL_USE { name, args, result }
//   Session linkage across turns
//   4-layer memory: episodic, semantic, procedural, working

Install

One package, four entry points

npm / pip

Library install. Wrap your existing OpenAI / Anthropic / Workers AI client, emit observability events, manage sessions manually if you need to.

npx … init

Interactive setup for the hosted TES platform. Creates a tenant, verifies your email, mints an API key, and prints the three env vars you need.

npx … memory

Runs the full memory stack locally — PostgreSQL + pgvector + Ollama in Docker. No account, no API key, no egress. Pulls embedding + chat models, writes config, and you're done.

npm install @pentatonic-ai/ai-agent-sdk

Two modes

Local for privacy. Hosted for scale.

The same SDK runs against a local Docker stack or against the hosted TES platform. Switch by changing config — the call sites don't change.

Local memory

Free forever

PostgreSQL + pgvector + Ollama on your own machine. Pi 5 with 8GB RAM is enough for the default models.

4-layer memory — episodic, semantic, procedural, working
Multi-signal retrieval (vector + BM25 + recency + frequency)
HyDE query expansion
Decay and consolidation cycles
Wrap any LLM client — OpenAI, Anthropic, Workers AI
Token usage, tool calls, sessions
MIT licensed, no telemetry, no egress

View on GitHub

Hosted TES platform

Per token

Same SDK pointed at the hosted platform. Higher-dimensional embeddings, conversation analytics, team-wide shared memory. Same per-token billing as the proxy.

Everything in local, plus:
NV-Embed-v2 (4096-dim) embeddings + HNSW index
Conversation analytics + dead-end detection
Team-wide shared memory + admin dashboard
Multi-tenancy with per-client data residency
Same token surface as the proxy — no parallel keys

Get an API key

Local — no config

// No API key needed — runs locally.
const tes = new TESClient();
const ai = tes.wrap(new OpenAI());

Hosted — three env vars

const tes = new TESClient({
  endpoint: process.env.TES_ENDPOINT,
  clientId: process.env.TES_CLIENT_ID,
  apiKey: process.env.TES_API_KEY,
});
const ai = tes.wrap(new OpenAI());  // same call sites

Provider agnostic

Three providers, one wrap

tes.wrap() detects the client type and intercepts the right method. Switch providers without changing tracking code.

JavaScript / TypeScript

import { TESClient } from "@pentatonic-ai/ai-agent-sdk";
import OpenAI from "openai";

const tes = new TESClient({
  endpoint: process.env.TES_ENDPOINT,    // https://<clientId>.api.pentatonic.com
  clientId: process.env.TES_CLIENT_ID,
  apiKey: process.env.TES_API_KEY,
});

const ai = tes.wrap(new OpenAI(), { sessionId: "conv-123" });

const res = await ai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});
// CHAT_TURN event emitted in the background — token usage, tool calls,
// session linkage — without changing the call site.

Python

import os
from pentatonic_ai_agent_sdk import TESClient
from openai import OpenAI

tes = TESClient(
    endpoint=os.environ["TES_ENDPOINT"],
    client_id=os.environ["TES_CLIENT_ID"],
    api_key=os.environ["TES_API_KEY"],
)

ai = tes.wrap(OpenAI(), session_id="conv-123")

res = ai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

Provider	Detection	Intercepted method	Available in
OpenAI	client.chat.completions.create	chat.completions.create()	JS + Python
Anthropic	client.messages.create	messages.create()	JS + Python
Workers AI	client.run	run()	JS only

All other methods on the wrapped client pass through unchanged.

Two patterns

Auto-wrap or manual sessions

tes.wrap() for zero boilerplate. tes.session() for full control over multi-turn loops. Both emit the same CHAT_TURN events.

Recommended

Auto-wrap

Pass your client to tes.wrap(). Tracking happens automatically — no changes to your existing call sites.

const tes = new TESClient({ endpoint, clientId, apiKey });
const ai = tes.wrap(new OpenAI());

const res = await ai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});
// CHAT_TURN emitted in the background.

Full control

Manual session

Create a session for multi-round tool-calling loops where you want to attach metadata to individual turns.

const session = tes.session({
  sessionId: "conv-123",
  metadata: { userId: "u_1", channel: "api" },
});

session.record(rawLLMResponse);

await session.emitChatTurn({
  userMessage: "Find items under $50",
  assistantResponse: res.content[0].text,
});

console.log(session.totalUsage);
// { prompt_tokens: 142, completion_tokens: 87, ai_rounds: 3 }

Reference

Public surface

Zero runtime dependencies. Full source on GitHub.

new TESClient({ endpoint, clientId, apiKey, ... })

Create a client. endpoint/clientId/apiKey are required for hosted mode. Optional: userId, captureContent, maxContentLength, waitUntil.

tes.wrap(client, { sessionId?, metadata? })

Return a transparent proxy around your LLM client. Detects provider, intercepts the right method, emits CHAT_TURN events without changing your code.

tes.session({ sessionId?, metadata? })

Create a Session manually for full control over multi-round tool-calling loops.

session.record(rawResponse)

Normalise a raw provider response and accumulate token usage on the session.

session.emitChatTurn({ userMessage, assistantResponse, ... })

Emit a CHAT_TURN event with accumulated data, then reset for the next turn.

session.emitToolUse({ name, args, result? })

Emit a TOOL_USE event linked to the current session.

normalizeResponse(raw)

Standalone utility — converts any provider response into { content, model, usage, toolCalls }.

createMemorySystem({ db, embedding, llm })

Library entry point — run the local 4-layer memory stack against your own pg + Ollama. Imported from @pentatonic-ai/ai-agent-sdk/memory.

Where to go from here

Open source

Start local. Scale with TES.

Install free, run locally on your own infra. When you need higher-dimensional embeddings, conversation analytics, and team-shared memory — point the same client at the hosted platform.

View on GitHub Get a TES API key

The library behind the proxy.
Wrap, observe, retrieve.

One package, four entry points

Local for privacy. Hosted for scale.

Local memory

Hosted TES platform

Three providers, one wrap

Auto-wrap or manual sessions

Auto-wrap

Manual session

Public surface

Pick the surface that fits

Proxy quickstart

Claude Code plugin

OpenClaw plugin

Memory layer

Start local. Scale with TES.

The library behind the proxy.Wrap, observe, retrieve.

One package, four entry points

Local for privacy. Hosted for scale.

Local memory

Hosted TES platform

Three providers, one wrap

Auto-wrap or manual sessions

Auto-wrap

Manual session

Public surface

Pick the surface that fits

Proxy quickstart

Claude Code plugin

OpenClaw plugin

Memory layer

Start local. Scale with TES.

The library behind the proxy.
Wrap, observe, retrieve.