Failover
Failover
Failover is the core value proposition of Quotio. It ensures that your development workflow is not interrupted when a specific API key hits a rate limit or runs out of credits.
How Failover Works
When an application (like a CLI tool or IDE plugin) requests an AI completion via Quotio, the request goes through the Routing Engine.
- Check Primary: The engine checks the status of the primary configured Agent.
- Evaluate Health: If the Agent is in
ExhaustedorErrorstate, it is skipped. - Select Secondary: The engine looks up the Failover Priority list.
- Route Request: The request is transparently proxied to the next healthy Agent.
The client application is unaware that a switch occurred. It simply receives the response.
Failover Strategies
You can configure different strategies for how the next agent is selected:
Priority Chain (Default)
Agents are ordered strictly (1, 2, 3...). Traffic always goes to the highest priority healthy agent.
Round Robin
Traffic is distributed evenly among all healthy agents. This is useful for load balancing across multiple free-tier accounts to avoid rate limits.
Cost Optimized
Traffic is routed to the cheapest healthy agent capable of handling the specific model request.
Trigger Conditions
Failover is triggered by:
- HTTP 429: Rate Limit Exceeded response from provider.
- HTTP 402: Payment Required / Quota Exceeded.
- HTTP 5xx: Provider internal server error (optional, disabled by default).
- Local Quota: The internal Quotio hard limit is reached.
Notifications
When a failover event occurs, Quotio sends a system notification:
"Primary OpenAI key exhausted. Switched to Backup Key #2."
Next Steps
Learn how to manage your accounts and API keys effectively.
