Overview
This guide shows you how to deploy Orpheus on your own infrastructure - whether that’s AWS, GCP, Azure, bare metal, or your laptop.
- Production-ready Orpheus daemon
- Queue-depth autoscaling (not CPU-based like K8s)
- Workspace persistence and crash recovery
- Full control over your infrastructure
Prerequisites
System Requirements
Minimum (CPU-only):- Ubuntu 22.04+ or Debian 11+
- 2GB RAM, 10GB disk
- Root access (sudo)
- Ubuntu 22.04 with NVIDIA GPU
- 8GB+ RAM, 50GB disk
- CUDA drivers (see GPU section below)
- ✅ AWS EC2
- ✅ GCP Compute Engine
- ✅ Azure VMs
- ✅ DigitalOcean Droplets
- ✅ Bare metal servers
- ✅ Your laptop (Linux)
Quick Start (Automated)
Step 1: Clone Repository
Step 2: Run Setup Script
- Installs runc + podman (container runtime)
- Installs Go (if not present)
- Builds Orpheus daemon from source
- Installs daemon to /usr/local/bin/
- Creates systemd service
- Starts daemon automatically
- Installs Ollama (optional, for local models)
Step 3: Verify Installation
Connect from Your Machine
Install CLI (Locally)
On your development machine (not the server):Connect to Your Server
Deploy Your First Agent
Option 1: Use Example
Option 2: Create Your Own
Create two files: agent.yaml:Testing with OpenAI/Anthropic
For agents that call cloud APIs, add API keys to agent.yaml:GPU Setup (Optional)
For GPU-accelerated inference with vLLM:Option A: Use Deep Learning AMI (Recommended)
AWS:Option B: Install CUDA Manually
If using regular Ubuntu with GPU:Install vLLM
Production Configuration
Firewall
Open port 8080 for Orpheus API:- Add inbound rule: TCP port 8080 from 0.0.0.0/0 (or your IP)
Monitoring
Daemon logs:Resource Limits
Edit/etc/systemd/system/orpheusd.service:
Troubleshooting
Daemon Won’t Start
Check logs:-
Port 8080 in use:
-
Runtimes not found:
-
Podman not found:
Agent Deploy Fails
Check daemon logs during deploy:Agent Execution Fails
Check execution logs:- OOM killed: Increase
memory:in agent.yaml - Timeout: Increase
timeout:in agent.yaml - Missing dependencies: Check requirements.txt or package.json
Tested Environments
This guide has been validated on: ✅ AWS EC2 (us-west-2)- Instance: g4dn.xlarge (Tesla T4 GPU)
- OS: Ubuntu 22.04 (Deep Learning AMI)
- Setup time: 5 minutes
- Test date: February 4, 2026
- Python runtime (OpenAI calculator) - 7.86s execution
- Node.js runtime (OpenAI calculator) - 3.29s execution
- Queue-depth autoscaling (1 → 5 workers)
- ExecLog tracking (45 executions logged)
- Self-hosted + published CLI integration
What You Need
Required:- Ubuntu/Debian server (cloud or bare metal)
- Root access
- Internet connection
- Server infrastructure (AWS/GCP/your own)
- Domain/IP for access (optional)
- API keys for LLMs (if using cloud APIs)
- Queue-depth autoscaling runtime
- Workspace persistence
- Crash recovery (ExecLog)
- Multi-runtime support (Python, Node.js)
- Model server management (Ollama, vLLM)
Next Steps
After self-hosting:- Deploy agents - Move beyond examples
- Set up monitoring - Connect Prometheus metrics to Grafana
- Add TLS - Use nginx reverse proxy for HTTPS
- Scale horizontally - Deploy multiple instances (advanced)
- Sign up at orpheus.run
- Same agents, zero infrastructure management

