Documentation
Failover & routing
Proxide automatically retries failed LLM requests against backup providers — transparently, without your application seeing an error. Configure fallback chains in the dashboard or override per-request with a header.
When failover triggers
Proxide triggers failover when it receives any of the following from the primary provider:
| Condition | Description |
|---|---|
| HTTP 429 | Rate limit — provider is throttling your requests |
| HTTP 500 / 502 / 503 | Server error or provider outage |
| Connection timeout | No response within 10 seconds |
| Stream stall | Streaming response stalls for > 5 seconds |
| Model unavailable | Specific model temporarily unavailable |
HTTP 4xx errors (except 429) are not retried — they indicate a client error (bad request, invalid model, etc.) that won't be resolved by switching providers.
Configuring your fallback chain
Set your default fallback chain in Settings → Routing in the Proxide dashboard. Providers are tried in order from first to last.
Example chain: OpenAI → Anthropic → Groq
Request
└─► openai/gpt-4o [429 rate limit]
└─► anthropic/claude-3-5-sonnet [200 OK] ──► ResponsePer-request fallback override
Override the default fallback chain for a specific request using the x-proxide-fallback header. The value is a comma-separated list of provider/model pairs.
const response = await client.chat.completions.create(
{
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
},
{
headers: {
// Try OpenAI first, then Anthropic, then Groq
"x-proxide-fallback":
"anthropic/claude-3-5-sonnet,groq/llama-3.3-70b",
},
}
);response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
extra_headers={
"x-proxide-fallback": "anthropic/claude-3-5-sonnet,groq/llama-3.3-70b",
}
)curl https://gateway.proxide.ai/openai/v1/chat/completions \
-H "Authorization: Bearer prox-your-key-here" \
-H "Content-Type: application/json" \
-H "x-proxide-fallback: anthropic/claude-3-5-sonnet,groq/llama-3.3-70b" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}'Checking which provider was used
Every response includes headers telling you which provider served the request and whether failover occurred:
x-proxide-provider: openai/gpt-4o
x-proxide-failover: falsex-proxide-provider: anthropic/claude-3-5-sonnet
x-proxide-failover: true
x-proxide-original-provider: openai/gpt-4o
x-proxide-original-error: 429
x-proxide-failover-latency-ms: 87Load balancing
In addition to failover, you can configure percentage-based load balancing to distribute traffic across providers. This is useful for very high-throughput applications that regularly hit rate limits on a single provider.
Route: /chat
├── openai/gpt-4o 70% (primary)
├── anthropic/claude-3-5 20% (secondary)
└── groq/llama-3.3-70b 10% (tertiary)
Failover: enabled for all providersSupported providers
All providers are normalised to the OpenAI API format. Your request structure stays the same regardless of which provider serves it.
| Provider | Supported models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, o3-mini |
| Anthropic | claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5 |
| Google Gemini | gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash |
| Groq | llama-3.3-70b, llama-3.1-8b, mixtral-8x7b |
| DeepSeek | deepseek-chat, deepseek-reasoner |
| xAI | grok-3, grok-3-mini, grok-2 |
| Mistral | mistral-large, mistral-small, codestral |
| Together AI | llama-3.3-70b, deepseek-r1, qwen-2.5 |
| Fireworks AI | llama-v3p1-70b, mixtral-8x22b |
| Cohere | command-r-plus, command-r |
| Perplexity | sonar-pro, sonar |
| Qwen | qwen-max, qwen-plus, qwq-32b |
Plus Moonshot, Zhipu, Qwen, HuggingFace, Replicate, and more. Check your dashboard for the full list.