Observability

DIT uses a comprehensive observability stack to monitor, trace, and debug applications across all environments. This document covers the tools, integrations, and best practices for application observability.

Observability Stack

Component	Tool	Purpose
Platform	Groundcover	eBPF-based observability platform
Tracing	OpenTelemetry (OTel)	Distributed tracing
Logs	stdout (collected by Groundcover)	Application logging
Metrics	Groundcover + Prometheus	System and application metrics
Error Tracking	Sentry	Exception monitoring

Groundcover

DIT uses Groundcover, an eBPF-based observability platform that automatically collects:

Logs – Captured from stdout/stderr
Traces – Collected via OpenTelemetry or auto-instrumentation
Metrics – System metrics via eBPF, application metrics via Prometheus

Why eBPF?

eBPF (extended Berkeley Packet Filter) runs in the Linux kernel, providing:

Zero-instrumentation metrics for network, CPU, memory
Automatic service discovery and dependency mapping
Low overhead compared to sidecar-based solutions
Language-agnostic collection without SDK changes

Groundcover Sensor Endpoint

Applications send traces to the Groundcover sensor:

yaml

OTEL_EXPORTER_OTLP_ENDPOINT: groundcover-sensor.groundcover.svc.cluster.local:4317

Metrics Collection

Groundcover automatically collects system-level metrics via eBPF. For application-level metrics (Prometheus format), you have two options:

Option 1: Auto-Discovery

Add these annotations to your pod for automatic scraping:

yaml

annotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"
  prometheus.io/path: "/metrics"

Option 2: PodMonitor / ServiceMonitor

For more control, create a PodMonitor or ServiceMonitor CRD:

yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-api
  namespace: my-team
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: my-api
  endpoints:
    - port: http
      path: /metrics
      interval: 30s

When to Use Each

Auto-discovery annotations – Simple setup, suitable for most applications
ServiceMonitor/PodMonitor – Advanced configuration (custom intervals, relabeling, multiple endpoints)

Network Policy for Metrics

If you enable metrics collection, you must allow ingress from the Groundcover scraper in your CiliumNetworkPolicy:

yaml

ciliumNetworkPolicy:
  ingress:
    groundcover-metrics:
      fromEndpoints:
        - matchLabels:
            app.kubernetes.io/name: groundcover-custom-metrics
            io.kubernetes.pod.namespace: groundcover
      toPorts:
        - ports:
            - port: "8080"  # Your metrics port

OpenTelemetry Integration

OpenTelemetry (OTel) integration is highly recommended for all applications. OTel provides distributed tracing, allowing you to follow requests across service boundaries.

Required: Traces Only

At DIT, OTel is used primarily for tracing. Metrics and logs are collected automatically by Groundcover.

.NET Implementation

csharp

// Program.cs
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter());

Environment Variables

Configure OTel via environment variables in your HelmRelease:

yaml

configs:
  OTEL_EXPORTER_OTLP_ENDPOINT: groundcover-sensor.groundcover.svc.cluster.local:4317
  OTEL_SERVICE_NAME: my-api
  OTEL_SERVICE_VERSION: "1.0.0"

Logging Best Practices

Log to stdout Only

All application logs must be written to stdout. Never log to files.

// ✅ Good - Log to stdout
logger := zerolog.New(os.Stdout).With().Timestamp().Logger()

// ❌ Bad - Log to file
file, _ := os.OpenFile("app.log", os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
logger := zerolog.New(file)

Groundcover automatically collects all stdout/stderr output from containers.

Structured Logging

Use structured (JSON) logging for machine-parseable logs:

// Go with zerolog
log := zerolog.New(os.Stdout).With().Timestamp().Logger()

log.Info().
    Str("user_id", userID).
    Str("action", "login").
    Int("duration_ms", 45).
    Msg("User logged in")

Output:

json

{"level":"info","user_id":"123","action":"login","duration_ms":45,"time":"2024-01-15T10:30:00Z","message":"User logged in"}

Include Trace Context

Correlate logs with traces by including trace and span IDs:

func (l *Logger) WithTrace(span trace.Span) *Logger {
    spanCtx := span.SpanContext()
    return &Logger{
        l.With().
            Str("trace_id", spanCtx.TraceID().String()).
            Str("span_id", spanCtx.SpanID().String()).
            Logger(),
    }
}

Log Levels

Use appropriate log levels:

Level	Use Case
`debug`	Detailed debugging information
`info`	Normal operations, request handling
`warn`	Unexpected but recoverable situations
`error`	Errors that need attention
`fatal`	Unrecoverable errors, application shutdown

What to Log

Do log:

Request/response summaries (method, path, status, duration)
Business events (user actions, state changes)
External service calls (duration, success/failure)
Errors with context

Don't log:

Sensitive data (passwords, tokens, PII)
High-frequency internal operations
Entire request/response bodies (unless debugging)

Network Policies for Observability

Applications sending traces to Groundcover need egress network policies:

yaml

ciliumNetworkPolicy:
  egress:
    groundcover-sensor:
      toEndpoints:
        - matchLabels:
            app.kubernetes.io/name: sensor
            io.kubernetes.pod.namespace: groundcover
      toPorts:
        - ports:
            - port: "4317"  # OTLP gRPC
            - port: "4318"  # OTLP HTTP (alternative)

Error Tracking with Sentry

In addition to observability, use Sentry for error tracking:

yaml

configs:
  Sentry__Dsn: https://xxx@sentry.url.here/25
  Sentry__Environment: prod

Sentry captures:

Unhandled exceptions
Error stack traces
Release tracking
Performance transactions

Summary

Practice	Requirement
OpenTelemetry tracing	Highly recommended
Log to stdout	Mandatory
Structured JSON logs	Mandatory
Include trace_id in logs	Recommended
Sentry error tracking	Mandatory
Network policy for Groundcover	Required if tracing

By following these observability practices, DIT ensures comprehensive visibility into application behavior, enabling rapid debugging, performance optimization, and proactive incident response.

Coding Standards

Race Conditions

KRDPASS

Sign in with KRDPASS (App-to-App)

Observability

Observability Stack

Groundcover

Why eBPF?

Groundcover Sensor Endpoint

Metrics Collection

Option 1: Auto-Discovery

Option 2: PodMonitor / ServiceMonitor

Network Policy for Metrics

OpenTelemetry Integration

Required: Traces Only

.NET Implementation

Environment Variables

Logging Best Practices

Log to stdout Only

Structured Logging

Include Trace Context

Log Levels

What to Log

Network Policies for Observability

Error Tracking with Sentry

Summary

Sign in with KRDPASS (App-to-App)

Observability ​

Observability Stack ​

Groundcover ​

Why eBPF? ​

Groundcover Sensor Endpoint ​

Metrics Collection ​

Option 1: Auto-Discovery ​

Option 2: PodMonitor / ServiceMonitor ​

Network Policy for Metrics ​

OpenTelemetry Integration ​

Required: Traces Only ​

.NET Implementation ​

Environment Variables ​

Logging Best Practices ​

Log to stdout Only ​

Structured Logging ​

Include Trace Context ​

Log Levels ​

What to Log ​

Network Policies for Observability ​

Error Tracking with Sentry ​

Summary ​

Observability

Observability Stack

Groundcover

Why eBPF?

Groundcover Sensor Endpoint

Metrics Collection

Option 1: Auto-Discovery

Option 2: PodMonitor / ServiceMonitor

Network Policy for Metrics

OpenTelemetry Integration

Required: Traces Only

.NET Implementation

Environment Variables

Logging Best Practices

Log to stdout Only

Structured Logging

Include Trace Context

Log Levels

What to Log

Network Policies for Observability

Error Tracking with Sentry

Summary