Live demo
Same prompt, two ways
Watch the same request run against the upstream LLM directly, then through TES. Token counts animate from 0 to the real benchmark numbers as the agent works. The only difference between the two runs is a https://llm.api.pentatonic.com base URL.
User prompt — identical on both sides
Show me the current memory-search-router.py search flow.
Without TES — direct to Anthropic
Full context re-derived each turn
input tokens sent upstream
0
// ready — press Run demo
With TES — llm.api.pentatonic.com
Preamble injected from the memory layer
input tokens sent upstream
0
// ready — press Run demo
Point your existing Anthropic, OpenAI, or MiniMax client at llm.api.pentatonic.com. Same code, same model, same response shape — lower input-token cost on every code and agentic workload.