Orpheus scales workers based on queue depth, not CPU.Documentation Index
Fetch the complete documentation index at: https://docs.orpheus.run/llms.txt
Use this file to discover all available pages before exploring further.
Default Scaling
- Adds workers when queue backs up
- Removes idle workers after 5 minutes
- Always keeps at least
min_workerswarm
How Scaling Works
Every 5 seconds, Orpheus checks:- 12 requests queued, 3 workers
- Utilization = 12/3 = 4.0
- 4.0 > 3.0 → scale up
Monitor Scaling
Tuning for Your Workload
Fast Tasks (< 1 second)
Slow Tasks (> 30 seconds)
Bursty Traffic
Cost-Sensitive
min_workers: 0 means cold starts. First request after idle period will be slower.Zero Cold Starts
For instant responses:Test Scaling
Send concurrent requests:Scaling Limits
| Setting | Effect |
|---|---|
min_workers | Minimum warm workers (even when idle) |
max_workers | Hard ceiling (won’t exceed) |
max_workers reached and queue still growing, requests wait.
Troubleshooting
Fix common issues →

