Kong Health Checks

Introduction

Health checks are a critical component of any robust microservice architecture. In Kong API Gateway, health checks provide automated monitoring of your upstream services, allowing Kong to detect when services become unavailable and redirect traffic accordingly. This mechanism helps maintain high availability and improves the overall reliability of your API ecosystem.

Kong's health check functionality periodically sends probe requests to your upstream targets. Based on the responses received, Kong determines whether these targets are healthy and should receive traffic, or unhealthy and should be temporarily removed from the load balancing pool.

Understanding Kong Health Checks

Kong offers two complementary types of health checks:

Active health checks: Kong proactively sends periodic requests to verify service availability
Passive health checks (circuit breakers): Kong monitors live traffic for failures

Let's explore each type in detail.

Active Health Checks

Active health checks involve Kong periodically sending HTTP/HTTPS requests to each target in your upstream service. These dedicated probe requests occur in the background and are separate from regular user traffic.

Key Configuration Parameters

Active health checks can be configured with several parameters:

Parameter	Description
`active.healthy.interval`	Interval between health checks for healthy targets (in seconds)
`active.unhealthy.interval`	Interval between health checks for unhealthy targets (in seconds)
`active.timeout`	Socket timeout for health check requests (in seconds)
`active.concurrency`	Number of concurrent health checks
`active.http_path`	Path to use for HTTP health checks
`active.healthy.threshold`	Number of successes to consider a target healthy
`active.unhealthy.threshold`	Number of failures to consider a target unhealthy

Passive Health Checks

Passive health checks (also known as circuit breakers) monitor the ongoing traffic between Kong and your upstream services. Instead of sending dedicated probe requests, Kong analyzes actual traffic patterns and responses to detect failures.

Key Configuration Parameters

Passive health checks can be customized with these parameters:

Parameter	Description
`passive.healthy.successes`	Number of successes to consider a target healthy
`passive.unhealthy.timeouts`	Number of timeouts to consider a target unhealthy
`passive.unhealthy.http_failures`	Number of HTTP failures to consider a target unhealthy
`passive.unhealthy.tcp_failures`	Number of TCP failures to consider a target unhealthy

Implementing Health Checks in Kong

Let's walk through the process of setting up health checks for your Kong services.

Prerequisites

Before configuring health checks, you need:

Kong Gateway installed and running
An upstream service configured in Kong
Admin access to Kong's configuration

Step 1: Configure an Upstream with Targets

First, we'll create an upstream service with multiple targets:

# Create an upstream
curl -X POST http://localhost:8001/upstreams \
  --data "name=my-api-service"

# Add targets to the upstream
curl -X POST http://localhost:8001/upstreams/my-api-service/targets \
  --data "target=service1:8000" \
  --data "weight=100"

curl -X POST http://localhost:8001/upstreams/my-api-service/targets \
  --data "target=service2:8000" \
  --data "weight=100"

Step 2: Configure Active Health Checks

Now, let's enable active health checks for our upstream:

curl -X PATCH http://localhost:8001/upstreams/my-api-service \
  --data "healthchecks.active.healthy.interval=5" \
  --data "healthchecks.active.healthy.successes=2" \
  --data "healthchecks.active.healthy.http_statuses=200,201,202" \
  --data "healthchecks.active.unhealthy.interval=5" \
  --data "healthchecks.active.unhealthy.http_failures=2" \
  --data "healthchecks.active.unhealthy.http_statuses=429,404,500,501,502,503,504,505" \
  --data "healthchecks.active.unhealthy.timeouts=3" \
  --data "healthchecks.active.http_path=/health"

This configuration:

Checks healthy targets every 5 seconds
Checks unhealthy targets every 5 seconds
Requires 2 successful responses to mark a target as healthy
Requires 2 failed responses to mark a target as unhealthy
Considers HTTP status codes 200, 201, 202 as successful
Considers HTTP status codes 429, 404, 500-505 as failures
Sets 3 seconds as the timeout threshold
Sends health check requests to the /health endpoint

Step 3: Configure Passive Health Checks

Let's also enable passive health checks:

curl -X PATCH http://localhost:8001/upstreams/my-api-service \
  --data "healthchecks.passive.healthy.successes=5" \
  --data "healthchecks.passive.unhealthy.http_failures=5" \
  --data "healthchecks.passive.unhealthy.http_statuses=429,500,503" \
  --data "healthchecks.passive.unhealthy.tcp_failures=2" \
  --data "healthchecks.passive.unhealthy.timeouts=7"

This configuration:

Requires 5 successful responses to mark a previously unhealthy target as healthy again
Requires 5 HTTP failures to mark a target as unhealthy
Considers HTTP status codes 429, 500, 503 as failures
Requires 2 TCP failures to mark a target as unhealthy
Sets 7 seconds as the timeout threshold

Step 4: Configure a Service and Route

Finally, let's connect everything by creating a service and route that uses our upstream:

# Create a service using the upstream
curl -X POST http://localhost:8001/services \
  --data "name=my-api" \
  --data "host=my-api-service"

# Create a route for the service
curl -X POST http://localhost:8001/services/my-api/routes \
  --data "name=my-api-route" \
  --data "paths[]=/api"

Declarative Configuration with Kong YAML

For those using declarative configuration, here's how to define health checks in your Kong YAML file:

_format_version: "2.1"
_transform: true

upstreams:
  - name: my-api-service
    healthchecks:
      active:
        healthy:
          interval: 5
          http_statuses:
            - 200
            - 201
            - 202
          successes: 2
        unhealthy:
          interval: 5
          http_failures: 2
          http_statuses:
            - 429
            - 404
            - 500
            - 501
            - 502
            - 503
            - 504
            - 505
          timeouts: 3
        http_path: /health
        timeout: 1
        concurrency: 10
      passive:
        healthy:
          http_statuses:
            - 200
            - 201
            - 202
          successes: 5
        unhealthy:
          http_failures: 5
          http_statuses:
            - 429
            - 500
            - 503
          tcp_failures: 2
          timeouts: 7
    slots: 100
    targets:
      - target: service1:8000
        weight: 100
      - target: service2:8000
        weight: 100

services:
  - name: my-api
    host: my-api-service
    routes:
      - name: my-api-route
        paths:
          - /api

Creating a Health Check Endpoint

For active health checks to work effectively, your services should expose a dedicated health check endpoint. Here's a simple example using Node.js and Express:

const express = require('express');
const app = express();
const port = 8000;

// Regular API endpoints
app.get('/api/users', (req, res) => {
  res.json({ users: ['Alice', 'Bob', 'Charlie'] });
});

// Health check endpoint
app.get('/health', (req, res) => {
  // Check critical dependencies
  const databaseHealthy = checkDatabaseConnection();
  const cacheHealthy = checkCacheConnection();
  
  if (databaseHealthy && cacheHealthy) {
    // All systems operational
    res.status(200).json({
      status: 'healthy',
      timestamp: new Date().toISOString(),
      services: {
        database: 'connected',
        cache: 'connected'
      }
    });
  } else {
    // Report unhealthy status with details
    res.status(503).json({
      status: 'unhealthy',
      timestamp: new Date().toISOString(),
      services: {
        database: databaseHealthy ? 'connected' : 'disconnected',
        cache: cacheHealthy ? 'connected' : 'disconnected'
      }
    });
  }
});

// Mock dependency check functions
function checkDatabaseConnection() {
  // In a real app, check actual database connection
  return true;
}

function checkCacheConnection() {
  // In a real app, check actual cache connection
  return true;
}

app.listen(port, () => {
  console.log(`Service running on port ${port}`);
});

Advanced Health Check Strategies

Threshold Tuning

Finding the right balance for your health check thresholds is crucial:

Too sensitive: Services might be marked unhealthy due to occasional network glitches
Too lenient: Unhealthy services might continue receiving traffic, causing user-facing errors

Start with moderate thresholds and adjust based on observed behavior in your environment.

HTTP vs. HTTPS Health Checks

Kong supports both HTTP and HTTPS health checks. For production environments, HTTPS health checks provide additional security:

curl -X PATCH http://localhost:8001/upstreams/my-api-service \
  --data "healthchecks.active.type=https" \
  --data "healthchecks.active.https_verify_certificate=true"

Custom Headers

You can add custom headers to health check requests, which is useful for authorization or identifying health check traffic:

curl -X PATCH http://localhost:8001/upstreams/my-api-service \
  --data "healthchecks.active.headers.X-Health-Check=true" \
  --data "healthchecks.active.headers.Authorization=Bearer your-token"

Monitoring Health Check Status

Kong provides several ways to monitor the health status of your targets:

Using Admin API

# Get health status for all targets in an upstream
curl http://localhost:8001/upstreams/my-api-service/health

# Sample response
{
  "total": 2,
  "data": [
    {
      "created_at": 1614695642,
      "health": "HEALTHY",
      "id": "a9f03889-b6cb-4d9d-8f6d-93f59acf5319",
      "target": "service1:8000",
      "upstream": {"id": "7f8598f3-4d95-4f14-9a0f-b841eca8cff1"}
    },
    {
      "created_at": 1614695647,
      "health": "UNHEALTHY",
      "id": "b2d4b79b-cfb7-4ef5-89c1-b3e6d4bcc202",
      "target": "service2:8000",
      "upstream": {"id": "7f8598f3-4d95-4f14-9a0f-b841eca8cff1"}
    }
  ]
}

Using Kong Manager

If you're using Kong Manager (Kong's web UI), you can view the health status of all targets in the "Upstreams" section.

Real-World Example: Multiple Datacenters

Let's implement a more complex example with Kong health checks for services across multiple datacenters:

_format_version: "2.1"
_transform: true

upstreams:
  - name: user-service
    healthchecks:
      active:
        healthy:
          interval: 5
          http_statuses:
            - 200
          successes: 2
        unhealthy:
          interval: 3
          http_failures: 3
          http_statuses:
            - 429
            - 500
            - 503
          timeouts: 2
        http_path: /health/status
        timeout: 1
        concurrency: 5
      passive:
        healthy:
          successes: 3
        unhealthy:
          http_failures: 5
          http_statuses:
            - 429
            - 500
            - 503
          timeouts: 3
    targets:
      # Primary datacenter
      - target: user-service-dc1.example.com:443
        weight: 100
      - target: user-service-dc1-backup.example.com:443
        weight: 50
      # Secondary datacenter
      - target: user-service-dc2.example.com:443
        weight: 75
      - target: user-service-dc2-backup.example.com:443
        weight: 25

services:
  - name: user-api
    host: user-service
    port: 443
    protocol: https
    routes:
      - name: user-endpoints
        paths:
          - /users

This configuration:

Sets up a user service with targets across two datacenters (DC1 and DC2)
Assigns different weights to primary and backup instances
Uses HTTPS for health checks
Implements both active and passive health checks
Routes /users requests to the upstream service

Troubleshooting Common Issues

Target Incorrectly Marked as Unhealthy

If targets are incorrectly marked as unhealthy:

Check if the health endpoint is responding properly:
```
curl -i https://service1:8000/health
```
Verify network connectivity between Kong and your targets
Check timeout settings (target might be responding too slowly)
Review your health check thresholds (they might be too strict)

All Targets Unhealthy

If all targets for an upstream become unhealthy, Kong will:

Return an error for new requests to that upstream
Continue health checks to detect when targets become healthy again

To avoid service disruption, consider:

Using a higher unhealthy threshold to prevent false positives
Implementing backup services
Setting up alerts for when targets become unhealthy

Best Practices

Use both active and passive health checks for comprehensive monitoring
Implement dedicated health endpoints that check critical dependencies
Set appropriate thresholds based on your service reliability and importance
Include multiple targets in each upstream for redundancy
Monitor health status through Kong's Admin API or Kong Manager
Set up alerting when targets become unhealthy
Test failover scenarios to ensure your configuration works as expected

Summary

Kong health checks provide a powerful mechanism for ensuring the reliability and availability of your API services. By implementing both active and passive health checks, you can automatically detect and route around failures, providing a better experience for your API consumers.

Key takeaways:

Active health checks proactively verify service health
Passive health checks monitor real traffic for failures
Properly configured health checks improve service reliability
Well-designed health check endpoints report meaningful status information
Balancing threshold sensitivity is crucial for optimal performance

Additional Resources

Exercises

Configure active and passive health checks for an upstream service using the Kong Admin API.
Create a health check endpoint in your favorite programming language that checks database connectivity.
Test a failover scenario by shutting down one of your service instances and observing Kong's behavior.
Implement a custom health check strategy that considers application-specific metrics.
Set up monitoring and alerting for health check status using Kong's Admin API.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Kong Health Checks​

Active Health Checks​

Key Configuration Parameters​

Passive Health Checks​

Key Configuration Parameters​

Implementing Health Checks in Kong​

Prerequisites​

Step 1: Configure an Upstream with Targets​

Step 2: Configure Active Health Checks​

Step 3: Configure Passive Health Checks​

Step 4: Configure a Service and Route​

Declarative Configuration with Kong YAML​

Creating a Health Check Endpoint​

Advanced Health Check Strategies​

Threshold Tuning​

HTTP vs. HTTPS Health Checks​

Custom Headers​

Monitoring Health Check Status​

Using Admin API​

Using Kong Manager​

Real-World Example: Multiple Datacenters​

Troubleshooting Common Issues​

Target Incorrectly Marked as Unhealthy​

All Targets Unhealthy​

Best Practices​

Summary​

Additional Resources​

Exercises​

Introduction

Understanding Kong Health Checks

Active Health Checks

Key Configuration Parameters

Passive Health Checks

Key Configuration Parameters

Implementing Health Checks in Kong

Prerequisites

Step 1: Configure an Upstream with Targets

Step 2: Configure Active Health Checks

Step 3: Configure Passive Health Checks

Step 4: Configure a Service and Route

Declarative Configuration with Kong YAML

Creating a Health Check Endpoint

Advanced Health Check Strategies

Threshold Tuning

HTTP vs. HTTPS Health Checks

Custom Headers

Monitoring Health Check Status

Using Admin API

Using Kong Manager

Real-World Example: Multiple Datacenters

Troubleshooting Common Issues

Target Incorrectly Marked as Unhealthy

All Targets Unhealthy

Best Practices

Summary

Additional Resources

Exercises