Monitoring Applications in Kubernetes

Introduction

Monitoring applications in Kubernetes is crucial for ensuring the health, performance, and reliability of your containerized workloads. Kubernetes provides a powerful platform for deploying and scaling applications, but without proper monitoring, you might face unexpected failures, performance bottlenecks, or resource constraints.

In this guide, we'll explore how to set up comprehensive monitoring for your applications running in Kubernetes using Prometheus and related tools. By the end of this tutorial, you'll understand how to collect metrics from your applications, visualize them, and set up alerts to be notified of potential issues before they affect your users.

Why Monitor Kubernetes Applications?

Before diving into the technical implementation, let's understand why application monitoring in Kubernetes is essential:

Detect issues early: Identify problems before they impact your users
Optimize resource usage: Ensure your applications use resources efficiently
Understand application behavior: Gain insights into how your applications perform under various conditions
Make data-driven decisions: Use metrics to guide scaling and improvement decisions
Ensure reliability: Maintain high availability and performance for your services

Monitoring Architecture in Kubernetes

Let's first understand how monitoring typically works in a Kubernetes environment:

Prerequisites

Before we begin, ensure you have:

A running Kubernetes cluster
kubectl configured to communicate with your cluster
Basic understanding of Kubernetes concepts (Pods, Services, Deployments)
Helm installed (optional, but recommended)

Setting Up Prometheus in Kubernetes

Method 1: Using Prometheus Operator with kube-prometheus-stack

The easiest way to set up Prometheus in Kubernetes is using the kube-prometheus-stack Helm chart, which includes Prometheus, Grafana, Alertmanager, and exporters.

# Add the Prometheus community Helm repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install kube-prometheus-stack
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace

This command installs the complete monitoring stack in the "monitoring" namespace.

Method 2: Manual Installation

If you prefer more control over your installation, you can deploy Prometheus components manually:

Create a namespace for monitoring:

kubectl create namespace monitoring

Deploy Prometheus:

# prometheus-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
    
    scrape_configs:
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https
      
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name

Apply the ConfigMap and deploy Prometheus:

kubectl apply -f prometheus-configmap.yaml

# Create RBAC roles for Prometheus
kubectl create clusterrolebinding prometheus --clusterrole=view --serviceaccount=monitoring:default

# Deploy Prometheus
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: prom/prometheus:v2.42.0
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/prometheus"
        ports:
        - containerPort: 9090
        volumeMounts:
        - name: config-volume
          mountPath: /etc/prometheus
        - name: prometheus-storage
          mountPath: /prometheus
      volumes:
      - name: config-volume
        configMap:
          name: prometheus-config
      - name: prometheus-storage
        emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
  namespace: monitoring
spec:
  selector:
    app: prometheus
  ports:
  - port: 9090
    targetPort: 9090
  type: ClusterIP
EOF

Instrumenting Your Applications

To monitor your applications in Kubernetes, you need to expose metrics that Prometheus can scrape. Let's look at how to do this for different types of applications:

1. Native Prometheus Integration

Many applications and frameworks support Prometheus metrics out of the box:

Spring Boot with Micrometer
Go applications with the Prometheus client library
Node.js applications with prom-client

Here's an example of a Go application that exposes Prometheus metrics:

package main

import (
	"net/http"
	"time"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
	"github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
	requestsProcessed = promauto.NewCounter(prometheus.CounterOpts{
		Name: "myapp_processed_requests_total",
		Help: "The total number of processed requests",
	})
	
	requestDuration = promauto.NewHistogram(prometheus.HistogramOpts{
		Name: "myapp_request_duration_seconds",
		Help: "The duration of requests in seconds",
		Buckets: prometheus.DefBuckets,
	})
)

func recordMetrics() {
	go func() {
		for {
			requestsProcessed.Inc()
			time.Sleep(2 * time.Second)
		}
	}()
}

func handleRequest(w http.ResponseWriter, r *http.Request) {
	start := time.Now()
	
	// Simulate work
	time.Sleep(100 * time.Millisecond)
	
	w.WriteHeader(http.StatusOK)
	w.Write([]byte("Hello, world!"))
	
	// Record request duration
	duration := time.Since(start).Seconds()
	requestDuration.Observe(duration)
}

func main() {
	recordMetrics()
	
	http.HandleFunc("/", handleRequest)
	http.Handle("/metrics", promhttp.Handler())
	
	http.ListenAndServe(":8080", nil)
}

Deploy this application to Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: sample-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: sample-app
  template:
    metadata:
      labels:
        app: sample-app
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      containers:
      - name: sample-app
        image: your-sample-app:latest
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: sample-app
spec:
  selector:
    app: sample-app
  ports:
  - port: 80
    targetPort: 8080

The key part here is the annotations on the pod template:

prometheus.io/scrape: "true" - Tells Prometheus to scrape this pod
prometheus.io/port: "8080" - Specifies which port to scrape

2. Using the Prometheus Operator

If you're using the Prometheus Operator, you can define a ServiceMonitor to scrape your application:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: sample-app-monitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: sample-app
  endpoints:
  - port: web  # This should match the name of the port in your Service
    interval: 15s

3. Using Exporters for Applications Without Native Support

For applications that don't expose Prometheus metrics natively, you can use exporters. For example, for a Redis deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-exporter
  template:
    metadata:
      labels:
        app: redis-exporter
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9121"
    spec:
      containers:
      - name: redis-exporter
        image: oliver006/redis_exporter:latest
        ports:
        - containerPort: 9121
        env:
        - name: REDIS_ADDR
          value: "redis:6379"

Important Metrics to Monitor

When monitoring applications in Kubernetes, focus on these key metrics:

1. Application Specific Metrics

Request count/rate
Error rate
Response time (latency)
Business-specific metrics (e.g., orders processed, user registrations)

2. Container Metrics

CPU usage
Memory usage
File system usage
Network I/O

3. Kubernetes Resource Metrics

Pod status (ready, not ready, etc.)
Deployment/StatefulSet status
Resource utilization vs. requests/limits
HPA (Horizontal Pod Autoscaler) metrics

Visualizing Metrics with Grafana

Prometheus is excellent at collecting and storing metrics, but Grafana provides powerful visualization capabilities:

Port-forward to access Grafana (if you installed kube-prometheus-stack):

kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80

Access Grafana at http://localhost:3000 (default credentials are usually admin/prom-operator)
Create dashboards for your applications:

Here's a sample Grafana dashboard for monitoring our sample application:

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": "-- Grafana --",
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "gnetId": null,
  "graphTooltip": 0,
  "id": 1,
  "links": [],
  "panels": [
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": null,
      "fieldConfig": {
        "defaults": {},
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 2,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "7.5.7",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "rate(myapp_processed_requests_total[5m])",
          "interval": "",
          "legendFormat": "{{kubernetes_pod_name}}",
          "refId": "A"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Request Rate",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "short",
          "label": "Requests/sec",
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    },
    {
      "aliasColors": {},
      "bars": false,
      "dashLength": 10,
      "dashes": false,
      "datasource": null,
      "fieldConfig": {
        "defaults": {},
        "overrides": []
      },
      "fill": 1,
      "fillGradient": 0,
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 0
      },
      "hiddenSeries": false,
      "id": 4,
      "legend": {
        "avg": false,
        "current": false,
        "max": false,
        "min": false,
        "show": true,
        "total": false,
        "values": false
      },
      "lines": true,
      "linewidth": 1,
      "nullPointMode": "null",
      "options": {
        "alertThreshold": true
      },
      "percentage": false,
      "pluginVersion": "7.5.7",
      "pointradius": 2,
      "points": false,
      "renderer": "flot",
      "seriesOverrides": [],
      "spaceLength": 10,
      "stack": false,
      "steppedLine": false,
      "targets": [
        {
          "expr": "histogram_quantile(0.95, sum(rate(myapp_request_duration_seconds_bucket[5m])) by (le))",
          "interval": "",
          "legendFormat": "95th Percentile",
          "refId": "A"
        },
        {
          "expr": "histogram_quantile(0.50, sum(rate(myapp_request_duration_seconds_bucket[5m])) by (le))",
          "interval": "",
          "legendFormat": "50th Percentile",
          "refId": "B"
        }
      ],
      "thresholds": [],
      "timeFrom": null,
      "timeRegions": [],
      "timeShift": null,
      "title": "Request Latency",
      "tooltip": {
        "shared": true,
        "sort": 0,
        "value_type": "individual"
      },
      "type": "graph",
      "xaxis": {
        "buckets": null,
        "mode": "time",
        "name": null,
        "show": true,
        "values": []
      },
      "yaxes": [
        {
          "format": "s",
          "label": "Duration",
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        },
        {
          "format": "short",
          "label": null,
          "logBase": 1,
          "max": null,
          "min": null,
          "show": true
        }
      ],
      "yaxis": {
        "align": false,
        "alignLevel": null
      }
    }
  ],
  "refresh": "10s",
  "schemaVersion": 27,
  "style": "dark",
  "tags": [],
  "templating": {
    "list": []
  },
  "time": {
    "from": "now-1h",
    "to": "now"
  },
  "timepicker": {},
  "timezone": "",
  "title": "Sample Application Dashboard",
  "uid": "sample-app",
  "version": 1
}

Setting Up Alerts

Monitoring is not complete without alerting. Let's set up alerts for our sample application:

Using Prometheus AlertManager

Create alert rules:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: sample-app-alerts
  namespace: monitoring
  labels:
    prometheus: k8s
    role: alert-rules
spec:
  groups:
  - name: sample-app
    rules:
    - alert: HighRequestLatency
      expr: histogram_quantile(0.95, sum(rate(myapp_request_duration_seconds_bucket[5m])) by (le)) > 0.5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "High request latency on {{ $labels.instance }}"
        description: "95th percentile of request latency is above 500ms (current value: {{ $value }}s)"
    
    - alert: HighErrorRate
      expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.05
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "High error rate on {{ $labels.instance }}"
        description: "Error rate is above 5% (current value: {{ $value }})"

Configure AlertManager (if using kube-prometheus-stack, edit the alertmanager-main Secret):

apiVersion: v1
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |
    global:
      resolve_timeout: 5m
      slack_api_url: 'https://hooks.slack.com/services/YOUR_SLACK_WEBHOOK_URL'

    route:
      receiver: 'slack-notifications'
      group_by: ['alertname', 'job']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 12h

    receivers:
    - name: 'slack-notifications'
      slack_configs:
      - channel: '#alerts'
        send_resolved: true
        title: |-
          [{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .CommonLabels.alertname }}
        text: >-
          {{ range .Alerts -}}
          *Alert:* {{ .Annotations.summary }}
          *Description:* {{ .Annotations.description }}
          *Severity:* {{ .Labels.severity }}
          {{ end }}

Best Practices for Kubernetes Application Monitoring

Use Labels Effectively: Design a consistent labeling strategy for your applications to make querying and filtering easier.
Monitor the Four Golden Signals:
- Latency: How long does it take to service a request?
- Traffic: How much demand is being placed on your system?
- Errors: What is the rate of failed requests?
- Saturation: How "full" is your service?
Implement RED Method for Microservices:
- Rate: Requests per second
- Errors: Number of failed requests
- Duration: Distribution of request latencies
Set Resource Requests and Limits: This helps Kubernetes schedule your pods effectively and prevents resource starvation.
Use Horizontal Pod Autoscaling: Automatically scale your applications based on metrics:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: sample-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: sample-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Pods
    pods:
      metric:
        name: myapp_processed_requests_total
      target:
        type: AverageValue
        averageValue: 100

Set Up Dashboards for Different Audiences: Create different dashboards for developers, operations, and business stakeholders.
Implement Health Checks: Use liveness and readiness probes:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5

Real-World Example: E-commerce Application Monitoring

Let's look at a practical example of monitoring an e-commerce application in Kubernetes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ecommerce-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ecommerce-api
  template:
    metadata:
      labels:
        app: ecommerce-api
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/actuator/prometheus"
    spec:
      containers:
      - name: ecommerce-api
        image: example/ecommerce-api:latest
        ports:
        - containerPort: 8080
        env:
        - name: SPRING_PROFILES_ACTIVE
          value: "production"
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1
            memory: 1Gi
        livenessProbe:
          httpGet:
            path: /actuator/health/liveness
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /actuator/health/readiness
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

Key metrics to monitor for this e-commerce application:

Business Metrics:
- Order rate
- Checkout success/failure rate
- Average order value
Technical Metrics:
- API response times
- Database query times
- Payment processing times
- Error rates by endpoint

Troubleshooting Common Issues

Issue 1: Prometheus Is Not Scraping Your Application

Possible causes and solutions:

Check if the annotations are correct:

kubectl get pods -l app=your-app -o jsonpath='{.items[0].metadata.annotations}'

Check if the metrics endpoint is accessible:

kubectl port-forward pod/your-app-pod-name 8080:8080
curl http://localhost:8080/metrics

Check Prometheus targets in the UI:
```
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
```
Then open http://localhost:9090/targets

Issue 2: High Memory Usage in Prometheus

Adjust retention period:

prometheus:
  retention: 7d
  resources:
    requests:
      memory: 2Gi
    limits:
      memory: 4Gi

Optimize scrape intervals and sample retention.

Issue 3: Missing Metrics

Check if the application is exposing the metrics correctly.
Verify the metric names in your Prometheus queries.
Check for label changes that might affect your queries.

Summary

Monitoring applications in Kubernetes using Prometheus provides deep insights into the health and performance of your workloads. By following the steps outlined in this guide, you've learned:

How to set up Prometheus in your Kubernetes cluster
How to instrument your applications to expose metrics
How to visualize metrics using Grafana dashboards
How to set up alerts to proactively respond to issues
Best practices for effective Kubernetes monitoring

Remember, monitoring is an ongoing process that should evolve with your applications. Regularly review your metrics, dashboards, and alerts to ensure they continue to provide valuable insights as your applications grow and change.

Additional Resources

Exercises

Set up Prometheus and Grafana in your Kubernetes cluster and monitor a simple application.
Create custom metrics for your application and visualize them in Grafana.
Configure alerts for high latency and error rates.
Implement a custom exporter for a third-party application that doesn't natively support Prometheus.
Use ServiceMonitors (if using Prometheus Operator) to configure scraping for different applications.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Monitor Kubernetes Applications?​

Monitoring Architecture in Kubernetes​

Prerequisites​

Setting Up Prometheus in Kubernetes​

Method 1: Using Prometheus Operator with kube-prometheus-stack​

Method 2: Manual Installation​

Instrumenting Your Applications​

1. Native Prometheus Integration​

2. Using the Prometheus Operator​

3. Using Exporters for Applications Without Native Support​

Important Metrics to Monitor​

1. Application Specific Metrics​

2. Container Metrics​

3. Kubernetes Resource Metrics​

Visualizing Metrics with Grafana​

Setting Up Alerts​

Using Prometheus AlertManager​

Best Practices for Kubernetes Application Monitoring​

Real-World Example: E-commerce Application Monitoring​

Troubleshooting Common Issues​

Issue 1: Prometheus Is Not Scraping Your Application​

Issue 2: High Memory Usage in Prometheus​

Issue 3: Missing Metrics​

Summary​

Additional Resources​

Exercises​