Monitoring and Error Tracking
All applications deployed at DIT must include robust, standardized monitoring and error tracking to ensure reliability, transparency, and rapid incident resolution. Monitoring and observability are essential components of operating cloud-native systems at scale, and every application must integrate with DIT’s approved tools and platforms.
DIT uses:
- Sentry for error tracking and exception monitoring
- OpenTelemetry (OTel) for metrics, logs, and distributed tracing
Both integrations are mandatory for all production applications unless explicitly exempted.
Error Tracking (Sentry)
Sentry provides real-time visibility into application errors, exceptions, and crashes. All applications must integrate with Sentry to ensure that operational issues are detected early and diagnosed effectively.
Requirements
- Sentry must be enabled in all production and staging environments.
- Errors must be reported with sufficient context, including request identifiers, environment, version, and relevant metadata.
- Applications must never swallow or silently ignore exceptions.
- Sensitive data must never be sent to Sentry.
- Each deployed version must include accurate release information so that Sentry can group issues by version.
Expected Outcomes
With Sentry properly integrated, developers must be able to:
- identify exceptions as they occur
- group recurring issues
- track regressions across releases
- quickly understand the root cause of errors
- reduce mean-time-to-resolution (MTTR)
Monitoring, Metrics, and Tracing (OpenTelemetry)
All applications must integrate OpenTelemetry (OTel) for unified monitoring across metrics, logs, and distributed traces.
OTel enables end-to-end visibility across services, allowing teams to understand performance, latency, failures, and complex request flows.
Requirements
- Applications must emit structured logs compatible with the OpenTelemetry logging format.
- Applications must expose OTel metrics, including request count, error count, latency, and resource usage where applicable.
- Applications must support distributed tracing, propagating trace identifiers across service boundaries.
- Trace context must be passed downstream (via W3C Trace Context headers).
- Services must be instrumented to include spans for major operations, database queries, message broker interactions, and external API calls.
Benefits
OpenTelemetry ensures:
- full visibility of request flows across microservices
- insight into performance bottlenecks
- ability to debug concurrency and race-condition issues
- centralized monitoring dashboards
- consistent data across all languages and frameworks
Alerts and Thresholds
Based on data collected from Sentry and OpenTelemetry:
- alerts must be configured for critical errors, latency spikes, resource saturation, and abnormal request patterns
- alerts must be sent through DIT’s standard alerting channels (e.g., OpsGenie, Slack, etc.)
- severity levels must be defined and adhered to
- noisy or low-value alerts must be avoided through proper tuning
Dashboards and Reporting
All production services must have:
- dashboards for real-time performance metrics
- error overview panels
- latency and throughput charts
- application health indicators
- release/version visibility
These dashboards must be made available to the Digital Development and DevOps teams.
Summary
DIT applications must include:
- Error Tracking → via Sentry
- Observability → via OpenTelemetry (metrics, logs, traces)
- Dashboards & Alerts → integrated with the operational ecosystem
These capabilities are essential for:
- diagnosing issues
- maintaining reliability
- ensuring high service quality
- supporting fast deployments
- enabling rapid incident response
Every application must adopt these standards as part of its core deployment and operational lifecycle.
