Documentation

Failover & routing

Proxide automatically retries failed LLM requests against backup providers — transparently, without your application seeing an error. Configure fallback chains in the dashboard or override per-request with a header.

When failover triggers

Proxide triggers failover when it receives any of the following from the primary provider:

ConditionDescription
HTTP 429Rate limit — provider is throttling your requests
HTTP 500 / 502 / 503Server error or provider outage
Connection timeoutNo response within 10 seconds
Stream stallStreaming response stalls for > 5 seconds
Model unavailableSpecific model temporarily unavailable

HTTP 4xx errors (except 429) are not retried — they indicate a client error (bad request, invalid model, etc.) that won't be resolved by switching providers.

Configuring your fallback chain

Set your default fallback chain in Settings → Routing in the Proxide dashboard. Providers are tried in order from first to last.

Example chain: OpenAI → Anthropic → Groq

Visual: failover flow
Request
  └─► openai/gpt-4o          [429 rate limit]
        └─► anthropic/claude-3-5-sonnet  [200 OK] ──► Response

Per-request fallback override

Override the default fallback chain for a specific request using the x-proxide-fallback header. The value is a comma-separated list of provider/model pairs.

TypeScript
const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
  },
  {
    headers: {
      // Try OpenAI first, then Anthropic, then Groq
      "x-proxide-fallback":
        "anthropic/claude-3-5-sonnet,groq/llama-3.3-70b",
    },
  }
);
Python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_headers={
        "x-proxide-fallback": "anthropic/claude-3-5-sonnet,groq/llama-3.3-70b",
    }
)
curl
curl https://gateway.proxide.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer prox-your-key-here" \
  -H "Content-Type: application/json" \
  -H "x-proxide-fallback: anthropic/claude-3-5-sonnet,groq/llama-3.3-70b" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Checking which provider was used

Every response includes headers telling you which provider served the request and whether failover occurred:

Normal response (no failover)
x-proxide-provider: openai/gpt-4o
x-proxide-failover: false
Failover response (primary failed)
x-proxide-provider: anthropic/claude-3-5-sonnet
x-proxide-failover: true
x-proxide-original-provider: openai/gpt-4o
x-proxide-original-error: 429
x-proxide-failover-latency-ms: 87

Load balancing

In addition to failover, you can configure percentage-based load balancing to distribute traffic across providers. This is useful for very high-throughput applications that regularly hit rate limits on a single provider.

Dashboard configuration example
Route: /chat
├── openai/gpt-4o          70%  (primary)
├── anthropic/claude-3-5   20%  (secondary)
└── groq/llama-3.3-70b     10%  (tertiary)

Failover: enabled for all providers

Supported providers

All providers are normalised to the OpenAI API format. Your request structure stays the same regardless of which provider serves it.

ProviderSupported models
OpenAIgpt-4o, gpt-4o-mini, o3-mini
Anthropicclaude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5
Google Geminigemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash
Groqllama-3.3-70b, llama-3.1-8b, mixtral-8x7b
DeepSeekdeepseek-chat, deepseek-reasoner
xAIgrok-3, grok-3-mini, grok-2
Mistralmistral-large, mistral-small, codestral
Together AIllama-3.3-70b, deepseek-r1, qwen-2.5
Fireworks AIllama-v3p1-70b, mixtral-8x22b
Coherecommand-r-plus, command-r
Perplexitysonar-pro, sonar
Qwenqwen-max, qwen-plus, qwq-32b

Plus Moonshot, Zhipu, Qwen, HuggingFace, Replicate, and more. Check your dashboard for the full list.