Kubernetes Logging Stack

Introduction

When running applications in Kubernetes, one of the biggest challenges is gaining visibility into what's happening inside your containers and across your cluster. This is where a proper logging stack becomes essential.

A Kubernetes logging stack is a collection of tools and components that work together to collect, process, store, and visualize logs from your Kubernetes applications and infrastructure. An effective logging solution helps you:

Troubleshoot application issues
Monitor system health
Detect security incidents
Track user behavior
Ensure compliance requirements
Perform long-term analysis of system performance

In this guide, we'll explore how to build a comprehensive logging stack for Kubernetes, focusing on beginner-friendly approaches while covering essential concepts.

Logging Challenges in Kubernetes

Before diving into solutions, let's understand why logging in Kubernetes is particularly challenging:

Ephemeral containers: Containers are short-lived; when they're gone, their logs disappear with them.
Distributed systems: Applications run across multiple nodes and pods.
Volume of data: Large clusters generate enormous amounts of logs.
Different log formats: Various applications produce logs in different formats.
Resource constraints: Logging itself consumes resources that must be managed.

Components of a Kubernetes Logging Stack

A complete Kubernetes logging stack typically consists of these components:

Let's explore each component in detail.

1. Log Collection

The first step is gathering logs from various sources in your Kubernetes cluster.

Native Kubernetes Logging

Kubernetes provides some basic logging capabilities out of the box. Each container's stdout and stderr streams are captured by the container runtime and written to a file on the node.

You can access these logs using the kubectl logs command:

# View logs from a specific pod
kubectl logs my-pod-name

# Follow logs in real-time
kubectl logs -f my-pod-name

# View logs from a specific container in a multi-container pod
kubectl logs my-pod-name -c my-container-name

However, these logs are limited in scope and persistence. When a pod is deleted or a node fails, these logs are lost.

Log Collection Agents

For a production-grade solution, you need dedicated log collection agents. Popular options include:

Fluentd

Fluentd is an open-source data collector that unifies log collection and consumption. It's part of the CNCF (Cloud Native Computing Foundation) and works well in Kubernetes environments.

To deploy Fluentd as a DaemonSet (ensuring it runs on every node):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule
      containers:
      - name: fluentd
        image: fluent/fluentd-kubernetes-daemonset:v1.14-debian-elasticsearch7-1
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

Filebeat

Filebeat is a lightweight log shipper from Elastic that's easy to configure:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: filebeat
  namespace: logging
spec:
  selector:
    matchLabels:
      app: filebeat
  template:
    metadata:
      labels:
        app: filebeat
    spec:
      containers:
      - name: filebeat
        image: docker.elastic.co/beats/filebeat:7.17.0
        args: ["-c", "/etc/filebeat.yml", "-e"]
        volumeMounts:
        - name: config
          mountPath: /etc/filebeat.yml
          subPath: filebeat.yml
        - name: varlog
          mountPath: /var/log
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: config
        configMap:
          name: filebeat-config
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers

2. Log Processing

Once logs are collected, they often need to be processed, transformed, and enriched before storage.

Fluentd Processing Example

Fluentd can process logs using its rich plugin ecosystem. Here's a ConfigMap with a simple configuration:

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: logging
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      <parse>
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      </parse>
    </source>

    <filter kubernetes.**>
      @type kubernetes_metadata
    </filter>

    <filter kubernetes.**>
      @type record_transformer
      <record>
        environment ${ENV_NAME}
        hostname ${HOSTNAME}
      </record>
    </filter>

    <match **>
      @type elasticsearch
      host elasticsearch
      port 9200
      logstash_format true
      logstash_prefix k8s-logs
      <buffer>
        @type file
        path /var/log/fluentd-buffers-k8s
        flush_thread_count 2
        flush_interval 5s
      </buffer>
    </match>

This configuration:

Collects container logs
Adds Kubernetes metadata (pod, namespace, etc.)
Enriches logs with environment and hostname information
Sends processed logs to Elasticsearch

3. Log Storage

After processing, logs need to be stored in a system that allows for efficient querying and analysis.

Elasticsearch

Elasticsearch is the most popular log storage option in Kubernetes environments. Here's a simple deployment:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch
  namespace: logging
spec:
  serviceName: elasticsearch
  replicas: 3
  selector:
    matchLabels:
      app: elasticsearch
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
      - name: elasticsearch
        image: docker.elastic.co/elasticsearch/elasticsearch:7.17.0
        env:
        - name: cluster.name
          value: k8s-logs
        - name: node.name
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: discovery.seed_hosts
          value: "elasticsearch-0.elasticsearch,elasticsearch-1.elasticsearch,elasticsearch-2.elasticsearch"
        - name: cluster.initial_master_nodes
          value: "elasticsearch-0,elasticsearch-1,elasticsearch-2"
        - name: ES_JAVA_OPTS
          value: "-Xms512m -Xmx512m"
        ports:
        - containerPort: 9200
          name: http
        - containerPort: 9300
          name: transport
        volumeMounts:
        - name: data
          mountPath: /usr/share/elasticsearch/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

Loki

Loki is a lightweight alternative developed by Grafana Labs, designed to be cost-effective and easy to operate:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: loki
  namespace: logging
spec:
  serviceName: loki
  replicas: 1
  selector:
    matchLabels:
      app: loki
  template:
    metadata:
      labels:
        app: loki
    spec:
      containers:
      - name: loki
        image: grafana/loki:2.6.1
        ports:
        - containerPort: 3100
          name: http-metrics
        volumeMounts:
        - name: config
          mountPath: /etc/loki
        - name: data
          mountPath: /data
      volumes:
      - name: config
        configMap:
          name: loki-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

4. Log Visualization

With logs stored, you need interfaces to query and visualize them.

Kibana

Kibana works seamlessly with Elasticsearch:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: logging
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
      - name: kibana
        image: docker.elastic.co/kibana/kibana:7.17.0
        env:
        - name: ELASTICSEARCH_HOSTS
          value: http://elasticsearch:9200
        ports:
        - containerPort: 5601
          name: http
---
apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: logging
spec:
  ports:
  - port: 5601
    targetPort: 5601
  selector:
    app: kibana

Grafana

If using Loki, Grafana is the natural choice for visualization:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana
  namespace: logging
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
    spec:
      containers:
      - name: grafana
        image: grafana/grafana:8.4.3
        env:
        - name: GF_SECURITY_ADMIN_USER
          value: admin
        - name: GF_SECURITY_ADMIN_PASSWORD
          value: admin
        ports:
        - containerPort: 3000
          name: http
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: logging
spec:
  ports:
  - port: 3000
    targetPort: 3000
  selector:
    app: grafana

Popular Logging Stack Combinations

Several well-established combinations of logging tools have emerged:

EFK Stack (Elasticsearch, Fluentd, Kibana)

This is one of the most popular stacks:

PLG Stack (Promtail, Loki, Grafana)

A more lightweight alternative:

Practical Example: Setting Up the EFK Stack

Let's walk through setting up the EFK stack in a Kubernetes cluster:

Step 1: Create a Namespace

kubectl create namespace logging

Step 2: Deploy Elasticsearch

# Apply the Elasticsearch configuration
kubectl apply -f elasticsearch.yaml

Step 3: Deploy Fluentd

# Create the Fluentd ConfigMap
kubectl apply -f fluentd-configmap.yaml

# Deploy Fluentd DaemonSet
kubectl apply -f fluentd-daemonset.yaml

Step 4: Deploy Kibana

kubectl apply -f kibana.yaml

Step 5: Access Kibana Dashboard

# Port-forward the Kibana service
kubectl port-forward -n logging svc/kibana 5601:5601

Now you can access Kibana at http://localhost:5601 and start exploring your logs.

Best Practices for Kubernetes Logging

Use structured logging: JSON logs are easier to parse and query:

// Instead of this:
console.log("User logged in: " + username);

// Do this:
console.log(JSON.stringify({
  message: "User login",
  username: username,
  timestamp: new Date().toISOString(),
  level: "info"
}));

Add context to logs: Include request IDs, user IDs, and other contextual information.
Configure log rotation: Prevent logs from consuming all available disk space.
Set appropriate resource limits: Ensure logging components don't starve your applications.
Implement log levels: Differentiate between DEBUG, INFO, WARN, and ERROR logs.
Consider log retention policies: Determine how long to keep logs based on compliance and practical needs.
Secure your logs: Logs often contain sensitive information; ensure proper access controls.

Troubleshooting Common Issues

Log Agent Not Collecting Logs

Check if the DaemonSet is running on all nodes:

kubectl get pods -n logging -o wide

Verify the log paths in your configuration.

Elasticsearch Performance Issues

Monitor Elasticsearch metrics:

kubectl exec -it elasticsearch-0 -n logging -- curl localhost:9200/_cat/indices

Consider adjusting JVM heap size or adding more nodes.

High Log Volume

Implement log sampling or filtering:

<filter kubernetes.**>
  @type grep
  <exclude>
    key log
    pattern /health|readiness|liveness/
  </exclude>
</filter>

Summary

A well-configured Kubernetes logging stack provides essential visibility into your applications and infrastructure. By understanding the components—collection, processing, storage, and visualization—you can build a logging solution that meets your specific needs.

Remember that logging is not a set-it-and-forget-it task. As your applications evolve, your logging needs will change. Regularly review and optimize your logging stack to ensure it continues to support your operations effectively.

Additional Resources

Exercises

Set up a minimal EFK stack on a local Kubernetes cluster (minikube or kind).
Configure Fluentd to exclude certain logs (like health checks).
Create a Kibana dashboard that displays application errors.
Implement structured logging in a simple application and deploy it to your cluster.
Compare log storage requirements between Elasticsearch and Loki for the same volume of logs.

💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!

Introduction​

Logging Challenges in Kubernetes​

Components of a Kubernetes Logging Stack​

1. Log Collection​

Native Kubernetes Logging​

Log Collection Agents​

Fluentd​

Filebeat​

2. Log Processing​

Fluentd Processing Example​

3. Log Storage​

Elasticsearch​

Loki​

4. Log Visualization​

Kibana​

Grafana​

Popular Logging Stack Combinations​

EFK Stack (Elasticsearch, Fluentd, Kibana)​

PLG Stack (Promtail, Loki, Grafana)​

Practical Example: Setting Up the EFK Stack​

Step 1: Create a Namespace​

Step 2: Deploy Elasticsearch​

Step 3: Deploy Fluentd​

Step 4: Deploy Kibana​

Step 5: Access Kibana Dashboard​

Best Practices for Kubernetes Logging​

Troubleshooting Common Issues​

Log Agent Not Collecting Logs​

Elasticsearch Performance Issues​

High Log Volume​

Summary​

Additional Resources​

Exercises​