Kubernetes Probes

Introduction

Imagine you're responsible for a fleet of delivery trucks. How do you know if each truck is functioning properly? You might check if the engine starts (liveness), if it's ready to take packages (readiness), and occasionally perform a full inspection (startup). Kubernetes probes work in a similar way for your containers.

Kubernetes probes are health-checking mechanisms that allow Kubernetes to monitor the health and availability of your containerized applications. They help ensure that your applications are running correctly and can handle traffic, making your system more reliable and resilient to failures.

Why Probes Matter

Without probes, Kubernetes has limited insight into the actual health of your applications. A container might be running but completely unable to serve requests due to:

An application deadlock
Running out of resources
Configuration errors
Database connection failures
And more...

Probes give Kubernetes the ability to detect these issues and take corrective action automatically.

Types of Kubernetes Probes

Kubernetes offers three types of probes:

Liveness Probe: Determines if a container is running properly
Readiness Probe: Determines if a container is ready to receive traffic
Startup Probe: Determines if an application within a container has started successfully

Let's explore each one in detail.

Liveness Probes

A liveness probe answers a simple question: "Is this container alive and healthy?"

If a liveness probe fails, Kubernetes will restart the container, assuming it has entered a broken state that can only be fixed by restarting.

When to Use Liveness Probes

For applications that might crash but won't exit
When recovering from a broken state requires a restart
To handle applications with memory leaks or deadlocks

Example Liveness Probe Configuration

yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-application
spec:
  containers:
  - name: app-container
    image: my-app:1.0
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 10
      timeoutSeconds: 5
      failureThreshold: 3

In this example:

The probe makes an HTTP GET request to /healthz on port 8080
It waits 15 seconds before the first check
It checks every 10 seconds thereafter
Each check must complete within 5 seconds
It will try 3 times before considering the container unhealthy and restarting it

Readiness Probes

A readiness probe answers the question: "Is this container ready to receive traffic?"

If a readiness probe fails, the container is removed from service load balancers, meaning it won't receive any new traffic, but it will not be restarted.

When to Use Readiness Probes

When your application takes time to load large data at startup
When dependent services need to be available first
When you need to implement graceful shutdown
To prevent traffic during maintenance or config reloads

Example Readiness Probe Configuration

yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-application
spec:
  containers:
  - name: app-container
    image: my-app:1.0
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10

In this example:

The probe checks if TCP port 8080 is accepting connections
It waits 5 seconds before the first check
It checks every 10 seconds thereafter

Startup Probes

A startup probe answers the question: "Has the application inside the container finished starting up?"

This probe is particularly useful for applications with slow startup times or legacy applications. While a startup probe is running, liveness and readiness probes are disabled.

When to Use Startup Probes

For applications with unpredictable or long startup times
For legacy applications that require additional startup time
To provide a longer time for initialization without affecting runtime liveness checks

Example Startup Probe Configuration

yaml
apiVersion: v1
kind: Pod
metadata:
  name: my-application
spec:
  containers:
  - name: app-container
    image: my-app:1.0
    startupProbe:
      exec:
        command:
        - cat
        - /tmp/startup-complete
      failureThreshold: 30
      periodSeconds: 10

In this example:

The probe checks if the file /tmp/startup-complete exists
It will retry up to 30 times with 10 seconds between each try
This gives the application up to 5 minutes (30 × 10 = 300 seconds) to complete its startup

Probe Mechanisms

Kubernetes supports several ways to implement probes:

HTTP GET: Performs an HTTP GET request to a specified path and port. Success is indicated by a response code between 200 and 399.
TCP Socket: Attempts to establish a TCP connection to a specified port. Success is indicated if the connection can be established.
Exec: Executes a command inside the container. Success is indicated by an exit code of 0.

HTTP GET Example

yaml
livenessProbe:
  httpGet:
    path: /api/health
    port: 8080
    httpHeaders:
    - name: Custom-Header
      value: check

TCP Socket Example

yaml
readinessProbe:
  tcpSocket:
    port: 3306

Exec Command Example

yaml
livenessProbe:
  exec:
    command:
    - sh
    - -c
    - "ps aux | grep my-process | grep -v grep"

Probe Configuration Parameters

You can fine-tune how probes work using these common parameters:

Parameter	Description	Default
`initialDelaySeconds`	Time to wait before first probe after container starts	0
`periodSeconds`	How often to perform the probe	10
`timeoutSeconds`	Time after which the probe times out	1
`successThreshold`	Minimum consecutive successes to be considered successful	1
`failureThreshold`	Number of retries before giving up	3

Visualizing Probe Workflow

Best Practices for Kubernetes Probes

Use Specific Health Endpoints: Create dedicated health check endpoints that verify critical components.

javascript
// Example Express.js health endpoint
app.get('/healthz', (req, res) => {
  // Check database connection
  if (dbConnected) {
    res.status(200).send('OK');
  } else {
    res.status(500).send('Database connection failed');
  }
});

Set Appropriate Thresholds: Configure timeouts and thresholds based on your application's behavior.
Include Dependencies in Health Checks: Check critical dependencies like databases or caches.
Keep Probes Lightweight: Health checks should be fast and use minimal resources.
Use All Three Probe Types: Each serves a different purpose in the application lifecycle.
Set Proper Initial Delays: Allow enough time for your application to initialize.

Real-World Example: Web Application with Database

Let's look at a complete example of a web application that depends on a database:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-application
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: web-app:1.0
        ports:
        - containerPort: 8080
        startupProbe:
          httpGet:
            path: /startup
            port: 8080
          failureThreshold: 30
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /healthz
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 15
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5

In this example:

The startup probe allows the application up to 5 minutes to initialize.
The liveness probe checks if the application is still responsive.
The readiness probe verifies that the application is ready to serve requests.

Your application would implement these endpoints to perform appropriate checks:

/startup: Basic check that application has loaded
/healthz: Deep health check including memory usage and critical functions
/ready: Checks database connectivity and cache availability

Implementing Health Check Endpoints

Here's a simple example of how you might implement these endpoints in a Node.js application:

javascript
const express = require('express');
const app = express();
const db = require('./database');

// Startup probe - simple check that server is running
app.get('/startup', (req, res) => {
  res.status(200).send('Application started');
});

// Liveness probe - check if application is working correctly
app.get('/healthz', (req, res) => {
  // Check if application is deadlocked or has errors
  const memoryUsage = process.memoryUsage().heapUsed / 1024 / 1024;
  
  if (memoryUsage > 1500) {
    // Memory leak detected
    res.status(500).send('Memory usage too high');
  } else {
    res.status(200).send('OK');
  }
});

// Readiness probe - check if application can serve requests
app.get('/ready', async (req, res) => {
  try {
    // Check database connection
    await db.ping();
    // Check other dependencies
    const cacheAvailable = await checkCache();
    
    if (cacheAvailable) {
      res.status(200).send('Ready to serve traffic');
    } else {
      res.status(503).send('Cache not available');
    }
  } catch (error) {
    res.status(503).send('Database connection failed');
  }
});

app.listen(8080);

Troubleshooting Probes

If your probes aren't working as expected, check these common issues:

Incorrect Path or Port: Verify the path and port match your application's configuration.
Timeout Too Short: Some operations might take longer than expected.
Dependency Failures: If your probe depends on external services, it might fail when they're unavailable.
Resource Constraints: Under high load, probes might time out if the container doesn't have enough resources.

You can debug probe issues by checking the pod events:

bash
kubectl describe pod <pod-name>

Look for events like:

Warning  Unhealthy  10s   kubelet  Readiness probe failed: Get "http://10.1.2.3:8080/ready": context deadline exceeded

Summary

Kubernetes probes are essential tools for building reliable, self-healing applications in Kubernetes:

Liveness probes detect and fix broken containers by restarting them
Readiness probes ensure traffic only goes to containers that can handle it
Startup probes give applications the time they need to initialize

By implementing appropriate probes with sensible configuration, you can dramatically improve the reliability and user experience of your Kubernetes applications.

Additional Resources

Exercises

Create a simple web application with a /health endpoint that checks memory usage and returns appropriate status codes.
Deploy an application to Kubernetes with all three types of probes configured.
Simulate a failure in your application and observe how Kubernetes responds based on your probe configuration.
Create a readiness probe that checks for the availability of an external service before allowing traffic.
Experiment with different probe parameters to find the optimal configuration for a slow-starting application.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Probes Matter​

Types of Kubernetes Probes​

Liveness Probes​

When to Use Liveness Probes​

Example Liveness Probe Configuration​

Readiness Probes​

When to Use Readiness Probes​

Example Readiness Probe Configuration​

Startup Probes​

When to Use Startup Probes​

Example Startup Probe Configuration​

Probe Mechanisms​

HTTP GET Example​

TCP Socket Example​

Exec Command Example​

Probe Configuration Parameters​

Visualizing Probe Workflow​

Best Practices for Kubernetes Probes​

Real-World Example: Web Application with Database​

Implementing Health Check Endpoints​

Troubleshooting Probes​

Summary​

Additional Resources​

Exercises​

Introduction

Why Probes Matter

Types of Kubernetes Probes

Liveness Probes

When to Use Liveness Probes

Example Liveness Probe Configuration

Readiness Probes

When to Use Readiness Probes

Example Readiness Probe Configuration

Startup Probes

When to Use Startup Probes

Example Startup Probe Configuration

Probe Mechanisms

HTTP GET Example

TCP Socket Example

Exec Command Example

Probe Configuration Parameters

Visualizing Probe Workflow

Best Practices for Kubernetes Probes

Real-World Example: Web Application with Database

Implementing Health Check Endpoints

Troubleshooting Probes

Summary

Additional Resources

Exercises