Skip to main content

Analytics Dataset — api_function_daily

Nightly snapshot summarising execution telemetry for every tenant/function pair.

Purpose

  • Feed product telemetry dashboards (latency, error rate, throughput) with pre-aggregated facts.
  • Enable Finance & Growth to model cost vs. revenue per function.
  • Provide customer-facing scorecards (later phases) without hammering Datadog APIs.

Grain

  • One row per combination of snapshot_date, tenant_id, rufid, endpoint, environment.
  • Data sourced from Datadog metrics API combined with control-plane metadata joins.

Schema

ColumnTypeDescription
snapshot_dateDATEUTC date of aggregation.
tenant_idTEXTTenant slug or UUID.
rufidTEXTShortened RUFID (first 12 chars).
endpointTEXTAPI route pattern (e.g. /execution/execute).
environmentTEXTDeployment environment (production, staging, etc.).
latency_p50_msFLOATMedian execution latency.
latency_p95_msFLOAT95th percentile latency.
latency_p99_msFLOAT99th percentile latency.
error_rateFLOATError percentage (4xx/5xx).
throughputFLOATRequests per minute averaged over the day.
queue_delay_p95_msFLOAT95th percentile queue delay.
cost_estimated_usdFLOATEstimated execution cost (USD).
cache_hit_pctFLOATCache hit percentage (optional until FN-059).
data_sourceTEXTProvenance (e.g. datadog, warehouse_backfill).
_ingested_atTIMESTAMPWarehouse ingestion timestamp.

Pipeline Overview

  1. airflow/dags/api_telemetry_snapshot.py runs at 02:15 UTC.
  2. DAG pulls 24h of metrics via Datadog /api/v1/query for:
    • relay.execution.duration
    • relay.execution.queue_delay
    • relay.execution.cost.estimated
    • relay.api.requests.error
  3. Join with tenant/function lookup (function_catalog table) for metadata.
  4. Load into Snowflake analytics.api_function_daily using MERGE on primary keys.
  5. Validate with Great Expectations suite (data_quality/api_telemetry.expectations.json).
  6. Publish freshness + quality status to #relay-observability Slack channel.

Consumption

  • Looker explore: Function Telemetry Daily (Phase 1 deliverable).
  • CSV export for Growth & PM weekly digests.
  • Playground scoreboard API (future phases) reads from this table to render SLOs.

Owner & SLA

  • Owner: Data Engineering (data-platform@deployrelay.com).
  • SLA: dataset available by 04:00 UTC daily; freshness alert fires if ingestion exceeds +2h.

Change Control

  • Schema updates require PR updates to this doc, expectations JSON, and DAG version bump.
  • Ops/Data Eng sign-off required before deploy.