Self-Hosting Guide

Overview

This guide shows you how to deploy Orpheus on your own infrastructure - whether that’s AWS, GCP, Azure, bare metal, or your laptop.

Time to first agent: ~5 minutes on fresh Ubuntu What you’ll get:

Production-ready Orpheus daemon
Queue-depth autoscaling (not CPU-based like K8s)
Workspace persistence and crash recovery
Full control over your infrastructure

Prerequisites

System Requirements

Minimum (CPU-only):

Ubuntu 22.04+ or Debian 11+
2GB RAM, 10GB disk
Root access (sudo)

Recommended (with GPU for vLLM):

Ubuntu 22.04 with NVIDIA GPU
8GB+ RAM, 50GB disk
CUDA drivers (see GPU section below)

For Testing in this Guide: We’ll use AWS EC2 for demonstration, but these steps work on:

✅ AWS EC2
✅ GCP Compute Engine
✅ Azure VMs
✅ DigitalOcean Droplets
✅ Bare metal servers
✅ Your laptop (Linux)

Quick Start (Automated)

Step 1: Clone Repository

git clone https://github.com/arpitnath/orpheus.git
cd orpheus

Step 2: Run Setup Script

sudo ./scripts/orchestrators/setup-production.sh

What it does (automatically):

Installs runc + podman (container runtime)
Installs Go (if not present)
Builds Orpheus daemon from source
Installs daemon to /usr/local/bin/
Creates systemd service
Starts daemon automatically
Installs Ollama (optional, for local models)

Time: ~3-5 minutes Output:

✓ Phase 1/5: Installing runc and podman
✓ Phase 2/5: Installing Orpheus daemon
✓ Phase 3/5: Configuring systemd service
✓ Phase 4/5: Installing Ollama
✓ Phase 5/5: Installing CLI

✓ Orpheus setup complete!
  Daemon URL: http://localhost:8080

Step 3: Verify Installation

curl http://localhost:8080/v1/health

Expected response:

{
  "status": "healthy",
  "version": "aurora-0.1.2",
  "uptime_seconds": 12,
  "running_agents": 0
}

Connect from Your Machine

Install CLI (Locally)

On your development machine (not the server):

npm install -g @orpheusrun/cli@0.1.4

Connect to Your Server

# Replace with your server's IP
orpheus connect http://your-server-ip:8080

# Verify connection
orpheus status

You should see:

● healthy                          💻 your-server-ip
  Uptime 5m                   http://your-server-ip:8080

  Agents     0    Workers    0    Queue    0

Deploy Your First Agent

Option 1: Use Example

git clone https://github.com/arpitnath/orpheus.git
cd orpheus/examples/basic/hello-world

orpheus deploy .
orpheus run hello-world '{"message": "Hello!"}'

Expected:

{
  "input_received": {"message": "Hello!"},
  "operation": "greet",
  "results": ["Hello, World! (message #1)"]
}

Option 2: Create Your Own

Create two files: agent.yaml:

name: my-agent
runtime: python3
module: agent
entrypoint: handler
memory: 256
timeout: 180

agent.py:

def handler(input_data: dict) -> dict:
    query = input_data.get('query', '')
    return {
        "response": f"Received: {query}",
        "status": "success"
    }

Deploy:

orpheus deploy .
orpheus run my-agent '{"query": "test"}'

Testing with OpenAI/Anthropic

For agents that call cloud APIs, add API keys to agent.yaml:

name: calculator
runtime: python3
module: calculator
entrypoint: handler

env:
  - OPENAI_API_KEY=sk-proj-your-key-here

Deploy and test:

orpheus deploy .
orpheus run calculator '{"query": "what is 25 * 4?"}'

GPU Setup (Optional)

For GPU-accelerated inference with vLLM:

Option A: Use Deep Learning AMI (Recommended)

AWS:

# Launch with Deep Learning Base OSS (Ubuntu 22.04)
# AMI ID: ami-0f7dad950c97ace0f (us-west-2)
# Instance: g4dn.xlarge or larger

Then run setup script - CUDA already installed!

Option B: Install CUDA Manually

If using regular Ubuntu with GPU:

# 1. Install NVIDIA drivers
sudo apt-get install -y nvidia-driver-535

# 2. Install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda

# 3. Verify
nvidia-smi

Then run setup script.

Install vLLM

pip install vllm

# Start vLLM server
python3 -m vllm.entrypoints.openai.api_server \
  --model TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
  --port 8000

Configure agent to use vLLM endpoint.

Production Configuration

Firewall

Open port 8080 for Orpheus API:

# UFW (Ubuntu)
sudo ufw allow 8080/tcp

# Or restrict to specific IPs
sudo ufw allow from 10.0.0.0/8 to any port 8080

AWS Security Group:

Add inbound rule: TCP port 8080 from 0.0.0.0/0 (or your IP)

Monitoring

Daemon logs:

sudo journalctl -u orpheusd -f

Prometheus metrics:

curl http://localhost:8080/metrics

Integrate with Grafana, Datadog, or your monitoring stack.

Resource Limits

Edit /etc/systemd/system/orpheusd.service:

[Service]
MemoryLimit=16G
LimitNOFILE=100000
LimitNPROC=10000

Then reload:

sudo systemctl daemon-reload
sudo systemctl restart orpheusd

Troubleshooting

Daemon Won’t Start

Check logs:

sudo journalctl -u orpheusd -n 50

Common issues:

Port 8080 in use:

sudo lsof -i :8080
# Kill process or change port

Runtimes not found:

sudo ls /var/lib/orpheus/runtimes/
# Should show: python/, nodejs/, runtimes.json

Podman not found:

sudo apt-get install -y podman
sudo systemctl restart orpheusd

Agent Deploy Fails

Check daemon logs during deploy:

sudo journalctl -u orpheusd -f

Verify runtimes:

ls -la /var/lib/orpheus/runtimes/

Agent Execution Fails

Check execution logs:

orpheus execlog list --agent <name> --limit 5
orpheus execlog crashed

Common issues:

OOM killed: Increase memory: in agent.yaml
Timeout: Increase timeout: in agent.yaml
Missing dependencies: Check requirements.txt or package.json

Tested Environments

This guide has been validated on: ✅ AWS EC2 (us-west-2)

Instance: g4dn.xlarge (Tesla T4 GPU)
OS: Ubuntu 22.04 (Deep Learning AMI)
Setup time: 5 minutes
Test date: February 4, 2026

✅ Key Features Tested:

Python runtime (OpenAI calculator) - 7.86s execution
Node.js runtime (OpenAI calculator) - 3.29s execution
Queue-depth autoscaling (1 → 5 workers)
ExecLog tracking (45 executions logged)
Self-hosted + published CLI integration

What You Need

Required:

Ubuntu/Debian server (cloud or bare metal)
Root access
Internet connection

You provide:

Server infrastructure (AWS/GCP/your own)
Domain/IP for access (optional)
API keys for LLMs (if using cloud APIs)

Orpheus provides:

Queue-depth autoscaling runtime
Workspace persistence
Crash recovery (ExecLog)
Multi-runtime support (Python, Node.js)
Model server management (Ollama, vLLM)

Next Steps

After self-hosting:

Deploy agents - Move beyond examples
Set up monitoring - Connect Prometheus metrics to Grafana
Add TLS - Use nginx reverse proxy for HTTPS
Scale horizontally - Deploy multiple instances (advanced)

Or migrate to managed cloud when you’re ready:

Sign up at orpheus.run
Same agents, zero infrastructure management

Documentation Index

​Overview

​Prerequisites

​System Requirements

​Quick Start (Automated)

​Step 1: Clone Repository

​Step 2: Run Setup Script

​Step 3: Verify Installation

​Connect from Your Machine

​Install CLI (Locally)

​Connect to Your Server

​Deploy Your First Agent

​Option 1: Use Example

​Option 2: Create Your Own

​Testing with OpenAI/Anthropic

​GPU Setup (Optional)

​Option A: Use Deep Learning AMI (Recommended)

​Option B: Install CUDA Manually

​Install vLLM

​Production Configuration

​Firewall

​Monitoring

​Resource Limits

​Troubleshooting

​Daemon Won’t Start

​Agent Deploy Fails

​Agent Execution Fails

​Tested Environments

​What You Need

​Next Steps