Skip to content

Monitoring Guidelines

CLI tools and libraries require no monitoring. Apply the points below to services.

Health Endpoints

Expose two endpoints for the orchestration platform:

  • /healthz/live — liveness: is the process running?
  • /healthz/ready — readiness: is the service ready to accept traffic?

Report not-ready during startup and graceful shutdown; never report not-live unless truly broken.

Metrics

  • Emit Prometheus-format metrics.
  • Define all metric names and label keys in a global constants file.
  • Minimum metrics to expose:
Metric Type
Request count (by status code) Counter
Request duration Histogram
In-flight requests Gauge
Dependency health (DB, broker) Gauge

Alerting

Alert on: error rate, latency (p50/p95/p99), saturation (CPU, memory, queue depth).

Distributed Tracing

Instrument all inter-service calls with OpenTelemetry. Propagate traceId in logs and metrics for correlation.