QuotioQuotio
Core Concepts

Failover

Failover

Failover is the core value proposition of Quotio. It ensures that your development workflow is not interrupted when a specific API key hits a rate limit or runs out of credits.

How Failover Works

When an application (like a CLI tool or IDE plugin) requests an AI completion via Quotio, the request goes through the Routing Engine.

  1. Check Primary: The engine checks the status of the primary configured Agent.
  2. Evaluate Health: If the Agent is in Exhausted or Error state, it is skipped.
  3. Select Secondary: The engine looks up the Failover Priority list.
  4. Route Request: The request is transparently proxied to the next healthy Agent.

The client application is unaware that a switch occurred. It simply receives the response.

Failover Strategies

You can configure different strategies for how the next agent is selected:

Priority Chain (Default)

Agents are ordered strictly (1, 2, 3...). Traffic always goes to the highest priority healthy agent.

Round Robin

Traffic is distributed evenly among all healthy agents. This is useful for load balancing across multiple free-tier accounts to avoid rate limits.

Cost Optimized

Traffic is routed to the cheapest healthy agent capable of handling the specific model request.

Trigger Conditions

Failover is triggered by:

  • HTTP 429: Rate Limit Exceeded response from provider.
  • HTTP 402: Payment Required / Quota Exceeded.
  • HTTP 5xx: Provider internal server error (optional, disabled by default).
  • Local Quota: The internal Quotio hard limit is reached.

Notifications

When a failover event occurs, Quotio sends a system notification:

"Primary OpenAI key exhausted. Switched to Backup Key #2."

Next Steps

Learn how to manage your accounts and API keys effectively.

Managing Providers

On this page