Analytics Dataset — api_function_daily
Nightly snapshot summarising execution telemetry for every tenant/function pair.
Purpose
- Feed product telemetry dashboards (latency, error rate, throughput) with pre-aggregated facts.
- Enable Finance & Growth to model cost vs. revenue per function.
- Provide customer-facing scorecards (later phases) without hammering Datadog APIs.
Grain
- One row per combination of
snapshot_date,tenant_id,rufid,endpoint,environment. - Data sourced from Datadog metrics API combined with control-plane metadata joins.
Schema
| Column | Type | Description |
|---|---|---|
snapshot_date | DATE | UTC date of aggregation. |
tenant_id | TEXT | Tenant slug or UUID. |
rufid | TEXT | Shortened RUFID (first 12 chars). |
endpoint | TEXT | API route pattern (e.g. /execution/execute). |
environment | TEXT | Deployment environment (production, staging, etc.). |
latency_p50_ms | FLOAT | Median execution latency. |
latency_p95_ms | FLOAT | 95th percentile latency. |
latency_p99_ms | FLOAT | 99th percentile latency. |
error_rate | FLOAT | Error percentage (4xx/5xx). |
throughput | FLOAT | Requests per minute averaged over the day. |
queue_delay_p95_ms | FLOAT | 95th percentile queue delay. |
cost_estimated_usd | FLOAT | Estimated execution cost (USD). |
cache_hit_pct | FLOAT | Cache hit percentage (optional until FN-059). |
data_source | TEXT | Provenance (e.g. datadog, warehouse_backfill). |
_ingested_at | TIMESTAMP | Warehouse ingestion timestamp. |
Pipeline Overview
airflow/dags/api_telemetry_snapshot.pyruns at 02:15 UTC.- DAG pulls 24h of metrics via Datadog
/api/v1/queryfor:relay.execution.durationrelay.execution.queue_delayrelay.execution.cost.estimatedrelay.api.requests.error
- Join with tenant/function lookup (
function_catalogtable) for metadata. - Load into Snowflake
analytics.api_function_dailyusing MERGE on primary keys. - Validate with Great Expectations suite (
data_quality/api_telemetry.expectations.json). - Publish freshness + quality status to
#relay-observabilitySlack channel.
Consumption
- Looker explore: Function Telemetry Daily (Phase 1 deliverable).
- CSV export for Growth & PM weekly digests.
- Playground scoreboard API (future phases) reads from this table to render SLOs.
Owner & SLA
- Owner: Data Engineering (
data-platform@deployrelay.com). - SLA: dataset available by 04:00 UTC daily; freshness alert fires if ingestion exceeds +2h.
Change Control
- Schema updates require PR updates to this doc, expectations JSON, and DAG version bump.
- Ops/Data Eng sign-off required before deploy.