Skip to main content

Railway Deployment Guide

Deploy Relay on Railway for managed or staging environments using this checklist.

Prerequisites

  • Railway project access (relay-api, relay-worker)
  • .env files exported from the control plane or generated locally
  • Clerk/Stripe credentials if enabling auth/billing

1. Install Dependencies

pip install -r requirements.txt
make install

Run migrations locally to verify schema currency:

python3 scripts/deployment/railway_migration_runner.py --dry-run

2. Provision Services

ServicePurpose
relay-apiFastAPI control plane
relay-workerExecution engine workers
relay-queueRedis/Upstash queue + cache
relay-dbPostgres (Neon) metadata store

Populate environment variables for each service (Railway dashboard → Variables). Copy from .env bundles and include Datadog credentials if available.

3. Deploy API Service

railway up --service relay-api

After the first deploy, apply migrations against the live database:

railway run --service relay-api python scripts/deployment/railway_migration_runner.py

4. Deploy Workers

railway up --service relay-worker --detach

Scale worker replicas based on queue depth and latency targets.

5. Configure Custom Domains

  1. Railway project → Settings → Domains
  2. Add api.deployrelay.com (or equivalent)
  3. Update DNS (CNAME or Cloudflare proxied record)
  4. Wait for SSL status Issued

6. Observability

  • Set DATADOG_API_KEY, DATADOG_APP_KEY, DATADOG_SITE
  • Run scripts/monitoring/api_telemetry_smoke.sh
  • Provision dashboards via scripts/monitoring/setup_production_monitoring.py --component datadog

7. Billing & Marketplace (Optional)

  • Configure Stripe secrets (STRIPE_SECRET_KEY, STRIPE_WEBHOOK_SECRET)
  • Enable marketplace background jobs for entitlements/payouts
  • Test billing webhooks following the Stripe webhook flow

8. Troubleshooting

IssueResolution
ModuleNotFoundError during buildEnsure pip install -r requirements.txt executed and lockfiles committed
Worker restarts repeatedlyInspect logs for sandbox dependency failures
Queue backlog risingIncrease worker concurrency or investigate slow functions
Missing telemetryConfirm Datadog env vars and agent connectivity