How Orpheus scales based on queue depth
utilization = (queued + processing) / workers If utilization > 3.0 → add worker If utilization < 0.5 → remove worker
10 requests arrive 3 workers running Queue depth = 10 utilization = 10/3 = 3.3 3.3 > 3.0 → scale up New worker spawns 4 workers now handle the load
scaling: min_workers: 1 # Floor max_workers: 10 # Ceiling
orpheus stats my-agent