AI agent memory infrastructure
that saves tokens and keeps context

Name: Thing Event System (TES)
Author: Pentatonic

Your agent starts cold every session — wasting tokens. With TES, your agent starts ready. Built on persistent memory across every source you use.

Start now with a single command:

npx @pentatonic-ai/ai-agent-sdk login

View the benchmarks

Why TES

Lower token bill. Same model. Same answer.

Three reasons the proxy approach earns its keep against the way things work today.

Same SDK, same model

Keep your existing client, your model, your prompt code. The integration is one env var or one Claude Code plugin — no rewrite, no lock-in.

Memory injected up front

Before the model runs, TES pulls the context it would otherwise have to re-derive — and folds it into the system preamble. The answer is in front of the model.

You can see exactly what we did

Audited methodology. Reproducible benchmark. Per-workload split published. Compare two invoices to see the gap.

How it works

One env var. Same SDK. Half the bill.

You already pay for an LLM. You probably also pay for the same context to be re-derived every turn. TES intercepts the request, fetches the context the model would have asked for, injects it as a preamble, and forwards the call.

Intercept

Point your existing Anthropic, OpenAI, or MiniMax client at llm.api.pentatonic.com. We receive the request unchanged — same SDK, same model, same response shape.

// .env — point base URL at TES, add your token
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_BASE_URL=https://llm.api.pentatonic.com
TES_API_KEY=tes_<clientId>_<random>

// Wire it on your client (one extra line)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
  defaultHeaders: { "X-TES-Token": process.env.TES_API_KEY },
});
await client.messages.create({ model, messages });

Retrieve & inject

TES retrieves the context your model would have re-derived this turn — files, prior memory, tool results — and injects it as a preamble. The answer is now in front of the model instead of being tool-called for.

// What TES does on the wire (you don't write this)
const context = await retrieve({
  session: req.session,
  files: req.referenced_files,
  memory: req.user_memory,
});

req.messages = [preamble(context), ...req.messages];
// e.g. on a code-lookup turn from the published benchmark:
// → 10,050 tokens of tool/file work avoided
// → 17 follow-up tool calls eliminated

Forward & return

We forward to your chosen upstream — Anthropic, OpenAI, MiniMax, or your own endpoint — and return the response untouched. Your code reads the same shape it always has. The bill goes down.

// Response is identical to direct upstream call
{
  id: "msg_01...",
  model: "claude-sonnet-4",
  content: [{ type: "text", text: "..." }],
  usage: { input_tokens: 450 /* was 10500 OFF */ },
  // overall benchmark median: 27.2% reduction.
  // code-lookup category median: 95.6%. See /benchmarks.
}

Read the full walkthrough See the benchmarks

Use cases

Who this is for

Any team paying a meaningful LLM bill where the model is re-deriving context it's already seen. Three workloads where retrieval-before-answer cuts token spend materially.

Stop paying to re-read the same repo every turn

Codex, Claude Code, Cursor, and every in-house coding agent burn most of their input tokens re-grepping and re-reading files the model already saw 30 seconds ago. TES caches that context the first time and injects it as a preamble on the next turn. 95% median input-token reduction on code-bound workloads in our benchmark. One env var. Same SDK. Same model. Same answer.

95% median input-token reduction on code queries
Works with Anthropic, OpenAI, MiniMax, and any OpenAI-compatible endpoint
Per-dev Pro tier from $20/mo

Learn more →

Two ways in

Pick the path that matches how you pay

Same memory layer underneath. Different transport.

Per-token API key

Anthropic or OpenAI workspace key. Drop-in proxy — change one env var. Bill drops directly with every compressed turn.

ANTHROPIC_BASE_URL=https://llm.api.pentatonic.com

Read the docs →

Claude Code, Cursor, or Codex on a subscription

Hooks-based plugin runs locally. Same retrieve-and-inject as the proxy — but never touches your auth path. Your subscription terms unchanged.

/plugin install tes-memory@pentatonic-ai

Install the plugin →

Pricing

Per-token. Lower than upstream. Audit by comparing two invoices.

We charge a per-token rate that's lower than going direct to Anthropic or OpenAI — and your total token count drops because the preamble compresses every turn. Customer wins twice. Both wins are visible on the bill.

Free

Solo devs, weekend projects, and Claude Code / Cursor / Codex subscribers using the memory plugin.

1M proxied input tokens / month
Anthropic Messages + OpenAI Chat Completions
Claude Code plugin — unlimited memory + sessions
Bring-your-own retrieval source (URLs, files)
Token-savings dashboard + request log
Soft-fail to upstream on TES error
Discord support

Get API key

Pro

$0.50per 1M input tokens

$20/mo minimum

Devs and small teams paying $50–$500/mo to LLM providers.

Per-token meter from $0 — minimum covers light use
Same routes + per-tenant memory layer
Per-project breakdown, exportable CSV
X-TES-Mode: passthrough for A/B comparisons
Email support, 1-business-day

Start with $20/mo

Enterprise

$0.30per 1M at volume

$1k/mo minimum, annual commit

Companies with a six- or seven-figure annual AI bill.

Volume per-token rate, predictable ceiling
Custom upstreams (MiniMax, vLLM, llama.cpp, your own inference)
Custom KGs / vector stores / private corpora
SLA, dedicated regions
Per-team breakdown, SSO, audit log
Slack channel, dedicated engineer

Talk to sales

Reference: Anthropic lists Sonnet input at $3 / 1M tokens direct. Our Pro per-token rate is $0.50 / 1M — about a sixth of that, before the compression saving. You can audit the gap by putting our invoice next to your direct upstream invoice.

Get started

Lower your AI bill without changing your code

Free tier covers 1M proxied input tokens per month. No credit card. Pro from $20/mo, enterprise with annual commit.

Get API key Request a demo

AI agent memory infrastructurethat saves tokens and keeps context