Orpheus scales workers based on queue depth, not CPU.
Default Scaling
scaling :
min_workers : 1 # Always keep 1 ready
max_workers : 10 # Scale up to 10
With defaults, Orpheus:
Adds workers when queue backs up
Removes idle workers after 5 minutes
Always keeps at least min_workers warm
How Scaling Works
Every 5 seconds, Orpheus checks:
utilization = (queued + processing) / current_workers
If utilization > 3.0 → add a worker
If utilization < 0.5 → remove a worker
Example:
12 requests queued, 3 workers
Utilization = 12/3 = 4.0
4.0 > 3.0 → scale up
Monitor Scaling
Output:
{
"workers" : {
"current" : 5 ,
"healthy" : 5 ,
"min" : 1 ,
"max" : 10
},
"queue" : {
"depth" : 3 ,
"processing" : 5
}
}
Tuning for Your Workload
Fast Tasks (< 1 second)
scaling :
min_workers : 2 # More warm workers
max_workers : 20 # Higher ceiling
Slow Tasks (> 30 seconds)
scaling :
min_workers : 1
max_workers : 5 # Fewer workers (each busy longer)
Bursty Traffic
scaling :
min_workers : 3 # Handle burst immediately
max_workers : 15
Cost-Sensitive
scaling :
min_workers : 0 # Scale to zero when idle
max_workers : 5
min_workers: 0 means cold starts. First request after idle period will be slower.
Zero Cold Starts
For instant responses:
scaling :
min_workers : 1 # Always one worker warm
This keeps one worker running even with no traffic.
Test Scaling
Send concurrent requests:
# Send 20 requests in parallel
for i in { 1..20} ; do
orpheus run my-agent '{"id": ' $i '}' &
done
wait
# Check how many workers scaled up
orpheus stats my-agent
Scaling Limits
Setting Effect min_workersMinimum warm workers (even when idle) max_workersHard ceiling (won’t exceed)
If max_workers reached and queue still growing, requests wait.
Troubleshooting Fix common issues →