Stop Overpaying for AI. Relay routes every call to the cheapest model that works.
A drop-in proxy for OpenAI, Anthropic, and Groq. Relay classifies every request and ships it to the right model — Opus when you need it, Haiku when you don't. Zero code changes.
You don't need Opus for a one-line prompt. Relay figures out what you actually need and routes accordingly.
🎯
Smart Routing
Relay inspects every request — token count, tool usage, reasoning flags — and picks the cheapest model that can handle it. Fast tier for short prompts, premium tier only when justified.
📊
Real-time Analytics
Live dashboard shows every call: model requested vs. model used, tokens, savings per call. Know exactly where your AI budget goes — and what Relay saved you.
⚡
Zero Code Changes
OpenAI-compatible endpoint. Change your base URL, keep your SDK. Works with every agent framework, every language. Deploy it in a minute.
Under the hood
How Relay decides.
A deterministic heuristic engine. No magic, no surprises. Full routing logs for every call.
1
Request arrives
Your agent hits POST /v1/chat/completions — same format as OpenAI.
2
Classify
Token count, tool_use blocks, system prompt length, reasoning flags. Three tiers: Fast, Balanced, Premium.
3
Route
Fast → Groq Llama-3.3 or Haiku. Balanced → Sonnet or GPT-4o-mini. Premium → Opus or GPT-4.
4
Ship response
Provider response comes back untouched. Your agent never knows the difference — except on the invoice.
# Classification logic if tokens < 500andnot has_tools:
tier = "FAST"
model = "llama-3.3-70b"# $0.0006/1k
We were burning $4k/mo on Opus for what turned out to be short classification calls. Relay cut that to $900 without touching a line of code.
MR
Maya Ramirez
CTO · Fieldwire Agents
↓ 78% monthly spend
Installed Relay before standup. By end of the day it had rerouted 11k calls and saved us $312. That's a wrap.
DS
Devraj Shah
Founding Eng · Loom for Sales
$312 saved day one
The dashboard alone is worth the price. I can finally see which agents are blowing the budget and which ones were fine running on Haiku the whole time.
AK
Ana Kowalski
Platform Lead · Arbor AI
12 agents audited in week 1
FAQ
Questions we hear a lot.
Does Relay ever send my data somewhere new?
No. Relay routes your request to the same providers you already use (OpenAI, Anthropic, Groq). It doesn't log prompts or completions — only metadata (token counts, model used, latency). Self-hosted option available on Scale.
How does routing not break my agent's quality?
Relay only downgrades when it's safe — short prompts, no tools, no reasoning flags. Anything ambiguous stays on Premium. You can also pin specific agents or routes to a tier, or disable routing and just use Relay for observability.
What's the latency overhead?
Classification is pure JS — under 2ms. Network hop to Relay adds ~10-30ms depending on region. For streaming responses, Relay passes bytes through as they arrive, so time-to-first-token is basically unchanged.
Do I still use my own API keys?
Yes. You give Relay your provider keys once (encrypted at rest) and Relay uses them to call the providers on your behalf. Billing stays on your accounts. Relay charges a flat monthly fee — no per-token markup.
Can I write custom routing rules?
Yes, on Growth and up. Rules are JSON: match on agent id, endpoint, token count, or any header, and pin to a tier or a specific model. Everything else falls through to the default heuristic.
Is there a self-hosted version?
Yes — the MVP proxy is open-source and ships with the nexusrelay/relay ClawHub skill. The managed version adds dashboards, auth, retention, and SLAs.
Route your first call in 60 seconds.
Your agents are already sending requests somewhere. Send them through Relay instead — see the savings by lunch.