Documentation

Failover & routing

Proxide automatically retries failed LLM requests against backup providers — transparently, without your application seeing an error. Configure fallback chains in the dashboard or override per-request with a header.

When failover triggers

Proxide triggers failover when it receives any of the following from the primary provider:

Condition	Description
HTTP 429	Rate limit — provider is throttling your requests
HTTP 500 / 502 / 503	Server error or provider outage
Connection timeout	No response within 10 seconds
Stream stall	Streaming response stalls for > 5 seconds
Model unavailable	Specific model temporarily unavailable

HTTP 4xx errors (except 429) are not retried — they indicate a client error (bad request, invalid model, etc.) that won't be resolved by switching providers.

Configuring your fallback chain

Set your default fallback chain in Settings → Routing in the Proxide dashboard. Providers are tried in order from first to last.

Example chain: OpenAI → Anthropic → Groq

Visual: failover flow

Request
  └─► openai/gpt-4o          [429 rate limit]
        └─► anthropic/claude-3-5-sonnet  [200 OK] ──► Response

Per-request fallback override

Override the default fallback chain for a specific request using the x-proxide-fallback header. The value is a comma-separated list of provider/model pairs.

TypeScript

const response = await client.chat.completions.create(
  {
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
  },
  {
    headers: {
      // Try OpenAI first, then Anthropic, then Groq
      "x-proxide-fallback":
        "anthropic/claude-3-5-sonnet,groq/llama-3.3-70b",
    },
  }
);

Python

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
    extra_headers={
        "x-proxide-fallback": "anthropic/claude-3-5-sonnet,groq/llama-3.3-70b",
    }
)

curl

curl https://gateway.proxide.ai/openai/v1/chat/completions \
  -H "Authorization: Bearer prox-your-key-here" \
  -H "Content-Type: application/json" \
  -H "x-proxide-fallback: anthropic/claude-3-5-sonnet,groq/llama-3.3-70b" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Checking which provider was used

Every response includes headers telling you which provider served the request and whether failover occurred:

Normal response (no failover)

x-proxide-provider: openai/gpt-4o
x-proxide-failover: false

Failover response (primary failed)

x-proxide-provider: anthropic/claude-3-5-sonnet
x-proxide-failover: true
x-proxide-original-provider: openai/gpt-4o
x-proxide-original-error: 429
x-proxide-failover-latency-ms: 87

Load balancing

In addition to failover, you can configure percentage-based load balancing to distribute traffic across providers. This is useful for very high-throughput applications that regularly hit rate limits on a single provider.

Dashboard configuration example

Route: /chat
├── openai/gpt-4o          70%  (primary)
├── anthropic/claude-3-5   20%  (secondary)
└── groq/llama-3.3-70b     10%  (tertiary)

Failover: enabled for all providers

Supported providers

All providers are normalised to the OpenAI API format. Your request structure stays the same regardless of which provider serves it.

Provider	Supported models
OpenAI	gpt-4o, gpt-4o-mini, o3-mini
Anthropic	claude-opus-4-6, claude-sonnet-4-6, claude-haiku-4-5
Google Gemini	gemini-1.5-pro, gemini-1.5-flash, gemini-2.0-flash
Groq	llama-3.3-70b, llama-3.1-8b, mixtral-8x7b
DeepSeek	deepseek-chat, deepseek-reasoner
xAI	grok-3, grok-3-mini, grok-2
Mistral	mistral-large, mistral-small, codestral
Together AI	llama-3.3-70b, deepseek-r1, qwen-2.5
Fireworks AI	llama-v3p1-70b, mixtral-8x22b
Cohere	command-r-plus, command-r
Perplexity	sonar-pro, sonar
Qwen	qwen-max, qwen-plus, qwq-32b

Plus Moonshot, Zhipu, Qwen, HuggingFace, Replicate, and more. Check your dashboard for the full list.