Quick Sizing Calculator
Step 1: Gather metrics- Expected requests per second (RPS):
_____ - Average task duration (seconds):
_____ - Peak burst size (concurrent requests):
_____
Sizing Formulas
Minimum Workers
Formula:min_workers = ceil(sustained_rps × avg_duration_sec)
Purpose: Keep enough workers warm to handle baseline load without cold starts
Example:
- Sustained RPS: 10 req/s
- Average duration: 3 seconds
- Calculation:
ceil(10 × 3) = 30 workers
Maximum Workers
Formula:max_workers = ceil(peak_rps × avg_duration_sec × 1.5)
The 1.5 multiplier provides buffer for:
- Variance in task duration
- Sudden traffic spikes
- Worker health issues
- Peak RPS: 50 req/s (during traffic spike)
- Average duration: 5 seconds
- Calculation:
ceil(50 × 5 × 1.5) = 375 workers
Cap at practical limits (100-200 workers per host). For more capacity, consider multi-host deployment.
Queue Size
Formula:queue_size = peak_concurrent_requests × 2
The 2x multiplier provides buffer for:
- Bursts while workers are spawning
- Requests arriving faster than processing
- Autoscaler reaction time
- Peak concurrent requests: 100
- Calculation:
100 × 2 = 200
Workload Type Recommendations
Chatbot / Low Latency
Characteristics:- Fast responses required (under 2 seconds)
- Moderate request rate (1-10 req/s)
- Short task duration (1-3s)
RAG / Search Agents
Characteristics:- Medium latency acceptable (5-10s)
- Bursty traffic patterns
- Medium task duration (5-10s)
Batch Processing
Characteristics:- Latency not critical
- Sustained load over time
- Long task duration (30-120s)
Demo / Burst Traffic
Characteristics:- Unpredictable traffic spikes
- Need to handle 100+ concurrent users
- Short demo tasks (1-5s)
Monitoring Your Capacity
Key Metrics
Queue Depth:utilization = busy_workers / total_workers
Timeout Rate:
Resource Requirements
Per-Worker Memory
| Runtime | Overhead | Typical Agent | Total |
|---|---|---|---|
| Python 3 | 50-100MB | 200-500MB | 250-600MB |
| Node.js | 30-80MB | 150-400MB | 180-480MB |
total_memory = num_workers × (overhead + agent_memory)
Example: 10 workers × 600MB = 6GB host memory needed
Per-Worker CPU
Agents are I/O-bound (waiting for LLM APIs), so CPU usage is low:- Average: 5-10% CPU per worker
- Peak: 50% CPU during processing
num_workers ≈ num_cpu_cores × 10
Example: 8 core machine can handle ~80 workers comfortably
Configurations by Scale
Small (Under 100 req/day)
Medium (100-10k req/day)
Large (10k-100k req/day)
Common Mistakes
❌ Setting min_workers = max_workers
Why wrong: Disables autoscaling, wastes resources Better:❌ Queue size smaller than max_workers
Why wrong: Queue fills before workers can scale Better:❌ Timeout smaller than avg task duration
Why wrong: Tasks timeout before completing Better:Monitor with Prometheus
Set up metrics and alerts →
Troubleshooting
Fix common issues →

