Documentation Index
Fetch the complete documentation index at: https://docs.orpheus.run/llms.txt
Use this file to discover all available pages before exploring further.
Queue-Depth Scaling
Orpheus scales on actual demand: how many requests are waiting.
Every 5 seconds:
utilization = (queued + processing) / workers
If utilization > 3.0 → add worker
If utilization < 0.5 → remove worker
Why Not CPU-Based?
AI agents spend most time waiting (for LLM APIs, databases, etc). CPU stays low even when busy.
Queue-depth scaling adds workers when work piles up, regardless of CPU.
Example
10 requests arrive
3 workers running
Queue depth = 10
utilization = 10/3 = 3.3
3.3 > 3.0 → scale up
New worker spawns
4 workers now handle the load
Configuration
scaling:
min_workers: 1 # Floor
max_workers: 10 # Ceiling
Monitor Scaling
Shows queue depth and worker count in real-time.