Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.orpheus.run/llms.txt

Use this file to discover all available pages before exploring further.

Conversational Memory Agent

This agent demonstrates stateful interactions using both Session Affinity and Workspace Persistence. It remembers the chat history for a specific user.

How it Works

  1. Session Routing: Orpheus routes requests with the same --session ID to the same worker if possible.
  2. In-Memory Cache: The worker keeps recent history in a Python dictionary for instant access.
  3. Workspace Backup: Every message is also backed up to a JSON file in /workspace/sessions/.
    • If the worker restarts, it reloads history from the workspace.
    • This guarantees durability even if the node crashes.

Key Features

  • Session ID: You pass a session_id (or use the CLI --session flag).
  • Dual-Layer State: Fast in-memory access + durable disk storage.
  • Context Window: The agent “remembers” what you said in previous turns.

Usage

# Start a conversation
orpheus run conversational-memory '{"message": "My name is Alice"}' --session user-123

# Follow up (it remembers!)
orpheus run conversational-memory '{"message": "What is my name?"}' --session user-123

Source Code

name: conversational-memory
runtime: python3
module: agent
entrypoint: handler

memory: 512
timeout: 120

# Model server via ServiceManager
model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
engine: vllm  # vLLM for GPU inference

env:
  - WORKSPACE_DIR=/workspace
  - MODEL_URL=${MODEL_URL:-http://localhost:8000}  # vLLM server endpoint

# Telemetry configuration - custom labels for Prometheus filtering
telemetry:
  enabled: true
  labels:
    team: ml_platform
    tier: standard
    use_case: conversational

# Session affinity CRITICAL for this agent
session:
  enabled: true
  key: "X-Session-ID"
  ttl: "2h"
  wait_timeout: "200ms"

scaling:
  min_workers: 1
  max_workers: 5