Skip to content

Observability

DIT uses a comprehensive observability stack to monitor, trace, and debug applications across all environments. This document covers the tools, integrations, and best practices for application observability.

Observability Stack

ComponentToolPurpose
PlatformGroundcovereBPF-based observability platform
TracingOpenTelemetry (OTel)Distributed tracing
Logsstdout (collected by Groundcover)Application logging
MetricsGroundcover + PrometheusSystem and application metrics
Error TrackingSentryException monitoring

Groundcover

DIT uses Groundcover, an eBPF-based observability platform that automatically collects:

  • Logs – Captured from stdout/stderr
  • Traces – Collected via OpenTelemetry or auto-instrumentation
  • Metrics – System metrics via eBPF, application metrics via Prometheus

Why eBPF?

eBPF (extended Berkeley Packet Filter) runs in the Linux kernel, providing:

  • Zero-instrumentation metrics for network, CPU, memory
  • Automatic service discovery and dependency mapping
  • Low overhead compared to sidecar-based solutions
  • Language-agnostic collection without SDK changes

Groundcover Sensor Endpoint

Applications send traces to the Groundcover sensor:

yaml
OTEL_EXPORTER_OTLP_ENDPOINT: groundcover-sensor.groundcover.svc.cluster.local:4317

Metrics Collection

Groundcover automatically collects system-level metrics via eBPF. For application-level metrics (Prometheus format), you have two options:

Option 1: Auto-Discovery

Add these annotations to your pod for automatic scraping:

yaml
annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

Option 2: PodMonitor / ServiceMonitor

For more control, create a PodMonitor or ServiceMonitor CRD:

yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-api
  namespace: my-team
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: my-api
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

When to Use Each

  • Auto-discovery annotations – Simple setup, suitable for most applications
  • ServiceMonitor/PodMonitor – Advanced configuration (custom intervals, relabeling, multiple endpoints)

Network Policy for Metrics

If you enable metrics collection, you must allow ingress from the Groundcover scraper in your CiliumNetworkPolicy:

yaml
ciliumNetworkPolicy:
  ingress:
    groundcover-metrics:
      fromEndpoints:
        - matchLabels:
            app.kubernetes.io/name: groundcover-custom-metrics
            io.kubernetes.pod.namespace: groundcover
      toPorts:
        - ports:
            - port: "8080"  # Your metrics port

OpenTelemetry Integration

OpenTelemetry (OTel) integration is highly recommended for all applications. OTel provides distributed tracing, allowing you to follow requests across service boundaries.

Required: Traces Only

At DIT, OTel is used primarily for tracing. Metrics and logs are collected automatically by Groundcover.

.NET Implementation

csharp
// Program.cs
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter());

Environment Variables

Configure OTel via environment variables in your HelmRelease:

yaml
configs:
  OTEL_EXPORTER_OTLP_ENDPOINT: groundcover-sensor.groundcover.svc.cluster.local:4317
  OTEL_SERVICE_NAME: my-api
  OTEL_SERVICE_VERSION: "1.0.0"

Logging Best Practices

Log to stdout Only

All application logs must be written to stdout. Never log to files.

go
// ✅ Good - Log to stdout
logger := zerolog.New(os.Stdout).With().Timestamp().Logger()

// ❌ Bad - Log to file
file, _ := os.OpenFile("app.log", os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
logger := zerolog.New(file)

Groundcover automatically collects all stdout/stderr output from containers.

Structured Logging

Use structured (JSON) logging for machine-parseable logs:

go
// Go with zerolog
log := zerolog.New(os.Stdout).With().Timestamp().Logger()

log.Info().
    Str("user_id", userID).
    Str("action", "login").
    Int("duration_ms", 45).
    Msg("User logged in")

Output:

json
{"level":"info","user_id":"123","action":"login","duration_ms":45,"time":"2024-01-15T10:30:00Z","message":"User logged in"}

Include Trace Context

Correlate logs with traces by including trace and span IDs:

go
func (l *Logger) WithTrace(span trace.Span) *Logger {
    spanCtx := span.SpanContext()
    return &Logger{
        l.With().
            Str("trace_id", spanCtx.TraceID().String()).
            Str("span_id", spanCtx.SpanID().String()).
            Logger(),
    }
}

Log Levels

Use appropriate log levels:

LevelUse Case
debugDetailed debugging information
infoNormal operations, request handling
warnUnexpected but recoverable situations
errorErrors that need attention
fatalUnrecoverable errors, application shutdown

What to Log

Do log:

  • Request/response summaries (method, path, status, duration)
  • Business events (user actions, state changes)
  • External service calls (duration, success/failure)
  • Errors with context

Don't log:

  • Sensitive data (passwords, tokens, PII)
  • High-frequency internal operations
  • Entire request/response bodies (unless debugging)

Network Policies for Observability

Applications sending traces to Groundcover need egress network policies:

yaml
ciliumNetworkPolicy:
  egress:
    groundcover-sensor:
      toEndpoints:
        - matchLabels:
            app.kubernetes.io/name: sensor
            io.kubernetes.pod.namespace: groundcover
      toPorts:
        - ports:
            - port: "4317"  # OTLP gRPC
            - port: "4318"  # OTLP HTTP (alternative)

Error Tracking with Sentry

In addition to observability, use Sentry for error tracking:

yaml
configs:
  Sentry__Dsn: https://xxx@sentry.url.here/25
  Sentry__Environment: prod

Sentry captures:

  • Unhandled exceptions
  • Error stack traces
  • Release tracking
  • Performance transactions

Summary

PracticeRequirement
OpenTelemetry tracingHighly recommended
Log to stdoutMandatory
Structured JSON logsMandatory
Include trace_id in logsRecommended
Sentry error trackingMandatory
Network policy for GroundcoverRequired if tracing

By following these observability practices, DIT ensures comprehensive visibility into application behavior, enabling rapid debugging, performance optimization, and proactive incident response.