Observability
DIT uses a comprehensive observability stack to monitor, trace, and debug applications across all environments. This document covers the tools, integrations, and best practices for application observability.
Observability Stack
| Component | Tool | Purpose |
|---|---|---|
| Platform | Groundcover | eBPF-based observability platform |
| Tracing | OpenTelemetry (OTel) | Distributed tracing |
| Logs | stdout (collected by Groundcover) | Application logging |
| Metrics | Groundcover + Prometheus | System and application metrics |
| Error Tracking | Sentry | Exception monitoring |
Groundcover
DIT uses Groundcover, an eBPF-based observability platform that automatically collects:
- Logs – Captured from stdout/stderr
- Traces – Collected via OpenTelemetry or auto-instrumentation
- Metrics – System metrics via eBPF, application metrics via Prometheus
Why eBPF?
eBPF (extended Berkeley Packet Filter) runs in the Linux kernel, providing:
- Zero-instrumentation metrics for network, CPU, memory
- Automatic service discovery and dependency mapping
- Low overhead compared to sidecar-based solutions
- Language-agnostic collection without SDK changes
Groundcover Sensor Endpoint
Applications send traces to the Groundcover sensor:
OTEL_EXPORTER_OTLP_ENDPOINT: groundcover-sensor.groundcover.svc.cluster.local:4317Metrics Collection
Groundcover automatically collects system-level metrics via eBPF. For application-level metrics (Prometheus format), you have two options:
Option 1: Auto-Discovery
Add these annotations to your pod for automatic scraping:
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"Option 2: PodMonitor / ServiceMonitor
For more control, create a PodMonitor or ServiceMonitor CRD:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-api
namespace: my-team
spec:
selector:
matchLabels:
app.kubernetes.io/name: my-api
endpoints:
- port: http
path: /metrics
interval: 30sWhen to Use Each
- Auto-discovery annotations – Simple setup, suitable for most applications
- ServiceMonitor/PodMonitor – Advanced configuration (custom intervals, relabeling, multiple endpoints)
Network Policy for Metrics
If you enable metrics collection, you must allow ingress from the Groundcover scraper in your CiliumNetworkPolicy:
ciliumNetworkPolicy:
ingress:
groundcover-metrics:
fromEndpoints:
- matchLabels:
app.kubernetes.io/name: groundcover-custom-metrics
io.kubernetes.pod.namespace: groundcover
toPorts:
- ports:
- port: "8080" # Your metrics portOpenTelemetry Integration
OpenTelemetry (OTel) integration is highly recommended for all applications. OTel provides distributed tracing, allowing you to follow requests across service boundaries.
Required: Traces Only
At DIT, OTel is used primarily for tracing. Metrics and logs are collected automatically by Groundcover.
.NET Implementation
// Program.cs
builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddOtlpExporter());Environment Variables
Configure OTel via environment variables in your HelmRelease:
configs:
OTEL_EXPORTER_OTLP_ENDPOINT: groundcover-sensor.groundcover.svc.cluster.local:4317
OTEL_SERVICE_NAME: my-api
OTEL_SERVICE_VERSION: "1.0.0"Logging Best Practices
Log to stdout Only
All application logs must be written to stdout. Never log to files.
// ✅ Good - Log to stdout
logger := zerolog.New(os.Stdout).With().Timestamp().Logger()
// ❌ Bad - Log to file
file, _ := os.OpenFile("app.log", os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
logger := zerolog.New(file)Groundcover automatically collects all stdout/stderr output from containers.
Structured Logging
Use structured (JSON) logging for machine-parseable logs:
// Go with zerolog
log := zerolog.New(os.Stdout).With().Timestamp().Logger()
log.Info().
Str("user_id", userID).
Str("action", "login").
Int("duration_ms", 45).
Msg("User logged in")Output:
{"level":"info","user_id":"123","action":"login","duration_ms":45,"time":"2024-01-15T10:30:00Z","message":"User logged in"}Include Trace Context
Correlate logs with traces by including trace and span IDs:
func (l *Logger) WithTrace(span trace.Span) *Logger {
spanCtx := span.SpanContext()
return &Logger{
l.With().
Str("trace_id", spanCtx.TraceID().String()).
Str("span_id", spanCtx.SpanID().String()).
Logger(),
}
}Log Levels
Use appropriate log levels:
| Level | Use Case |
|---|---|
debug | Detailed debugging information |
info | Normal operations, request handling |
warn | Unexpected but recoverable situations |
error | Errors that need attention |
fatal | Unrecoverable errors, application shutdown |
What to Log
Do log:
- Request/response summaries (method, path, status, duration)
- Business events (user actions, state changes)
- External service calls (duration, success/failure)
- Errors with context
Don't log:
- Sensitive data (passwords, tokens, PII)
- High-frequency internal operations
- Entire request/response bodies (unless debugging)
Network Policies for Observability
Applications sending traces to Groundcover need egress network policies:
ciliumNetworkPolicy:
egress:
groundcover-sensor:
toEndpoints:
- matchLabels:
app.kubernetes.io/name: sensor
io.kubernetes.pod.namespace: groundcover
toPorts:
- ports:
- port: "4317" # OTLP gRPC
- port: "4318" # OTLP HTTP (alternative)Error Tracking with Sentry
In addition to observability, use Sentry for error tracking:
configs:
Sentry__Dsn: https://xxx@sentry.url.here/25
Sentry__Environment: prodSentry captures:
- Unhandled exceptions
- Error stack traces
- Release tracking
- Performance transactions
Summary
| Practice | Requirement |
|---|---|
| OpenTelemetry tracing | Highly recommended |
| Log to stdout | Mandatory |
| Structured JSON logs | Mandatory |
| Include trace_id in logs | Recommended |
| Sentry error tracking | Mandatory |
| Network policy for Groundcover | Required if tracing |
By following these observability practices, DIT ensures comprehensive visibility into application behavior, enabling rapid debugging, performance optimization, and proactive incident response.
