Live demo

Same prompt, two ways

Watch the same request run against the upstream LLM directly, then through TES. Token counts animate from 0 to the real benchmark numbers as the agent works. The only difference between the two runs is a https://llm.api.pentatonic.com base URL.

Scenario

User prompt — identical on both sides

Show me the current memory-search-router.py search flow.

Without TES — direct to Anthropic

Full context re-derived each turn

input tokens sent upstream

0

// ready — press Run demo

With TES — llm.api.pentatonic.com

Preamble injected from the memory layer

input tokens sent upstream

0

// ready — press Run demo

Point your existing Anthropic, OpenAI, or MiniMax client at llm.api.pentatonic.com. Same code, same model, same response shape — lower input-token cost on every code and agentic workload.