Kubernetes Database

Introduction

When building applications on Kubernetes, data persistence is often one of the most challenging aspects to manage. While Kubernetes excels at orchestrating stateless applications, databases and other stateful services require special handling. This guide explores how to effectively deploy and manage databases in Kubernetes environments.

Kubernetes provides several resources that make database deployments possible:

StatefulSets: For maintaining stable, unique network identifiers and persistent storage
PersistentVolumes: For data persistence beyond the lifecycle of a pod
ConfigMaps and Secrets: For managing database configuration and credentials
Services: For stable networking and service discovery

By the end of this guide, you'll understand how to deploy databases on Kubernetes, configure persistent storage, manage database credentials securely, and implement best practices for production-grade database deployments.

Prerequisites

Before proceeding, you should have:

A basic understanding of Kubernetes concepts (Pods, Deployments, Services)
Access to a Kubernetes cluster (local like Minikube or cloud-based)
kubectl command-line tool installed and configured
Basic familiarity with database concepts

Why Run Databases on Kubernetes?

You might wonder why you should run databases on Kubernetes when many cloud providers offer managed database services. Here are some key benefits:

Unified Infrastructure: Keep your entire application stack, including databases, in a single platform
Consistent Deployment Patterns: Use the same CI/CD pipelines and tooling for all components
Portability: Move between cloud providers or on-premise infrastructures more easily
Cost Optimization: Potentially reduce costs by leveraging existing Kubernetes infrastructure
Advanced Orchestration: Utilize Kubernetes' self-healing, scaling, and monitoring capabilities

However, it's important to understand the trade-offs. Managing stateful workloads on Kubernetes requires more planning and expertise than stateless applications.

Understanding StatefulSets

Unlike Deployments, which are designed for stateless applications, StatefulSets are specifically designed for applications that require:

Stable, unique network identifiers
Stable, persistent storage
Ordered, graceful deployment and scaling

These characteristics make StatefulSets ideal for database workloads. Let's see how a basic StatefulSet definition looks:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: "postgres"
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:13
        ports:
        - containerPort: 5432
          name: postgres
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 10Gi

Notice the volumeClaimTemplates section, which automatically creates PersistentVolumeClaims for each Pod in the StatefulSet.

Persistent Storage for Databases

For databases to function properly in Kubernetes, you need to set up persistent storage correctly. Let's explore the key components:

PersistentVolumes (PV)

A PersistentVolume is a piece of storage in the cluster provisioned by an administrator or dynamically provisioned using Storage Classes.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: postgres-pv
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: standard
  hostPath:
    path: "/mnt/data"

PersistentVolumeClaims (PVC)

A PersistentVolumeClaim is a request for storage by a user that can be fulfilled by a PersistentVolume.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: standard

StorageClasses

StorageClasses allow dynamic provisioning of PersistentVolumes when PVCs are created.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true

Deploying a Single Instance Database

Let's walk through deploying a PostgreSQL database on Kubernetes, step by step:

Step 1: Create a Secret for Database Credentials

First, create a Secret to store database credentials:

apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
type: Opaque
data:
  password: cG9zdGdyZXNwYXNzd29yZA== # base64 encoded 'postgrespassword'
  username: cG9zdGdyZXM= # base64 encoded 'postgres'

You can create this Secret using:

kubectl apply -f postgres-secret.yaml

Step 2: Create a ConfigMap for Database Configuration

For database configuration parameters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
data:
  POSTGRES_DB: "myapp"
  POSTGRES_INITDB_ARGS: "--data-checksums"

Apply the ConfigMap:

kubectl apply -f postgres-config.yaml

Step 3: Deploy PostgreSQL using a StatefulSet

Now create the StatefulSet:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: "postgres"
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:13
        ports:
        - containerPort: 5432
          name: postgres
        envFrom:
        - configMapRef:
            name: postgres-config
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: username
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
          subPath: postgres
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "1Gi"
            cpu: "500m"
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: standard
      resources:
        requests:
          storage: 10Gi

Apply the StatefulSet:

kubectl apply -f postgres-statefulset.yaml

Step 4: Create a Service for Database Access

Finally, create a Service to make the database accessible:

apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  ports:
  - port: 5432
    name: postgres
  clusterIP: None
  selector:
    app: postgres

Apply the Service:

kubectl apply -f postgres-service.yaml

This creates a headless service (with clusterIP: None) that allows direct addressing of the StatefulSet pods.

Testing Your Database Deployment

To verify your database deployment is working correctly:

# Check the status of the StatefulSet
kubectl get statefulset postgres

# Check the running pod
kubectl get pods -l app=postgres

# Check persistent volume claims
kubectl get pvc

# Connect to the PostgreSQL pod
kubectl exec -it postgres-0 -- psql -U postgres

Inside the PostgreSQL shell, you can create a table and insert data:

CREATE TABLE test (id serial PRIMARY KEY, name VARCHAR(50));
INSERT INTO test (name) VALUES ('Kubernetes Database Test');
SELECT * FROM test;

This confirms your database is working and data is being persisted.

Managing Database Backups

Regular backups are critical for database workloads. Here's a simple approach to create a backup job:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: postgres-backup
spec:
  schedule: "0 1 * * *"  # Run at 1 AM every day
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: postgres-backup
            image: postgres:13
            command:
            - /bin/bash
            - -c
            - |
              pg_dump -h postgres -U postgres -d myapp > /backup/myapp-$(date +%Y-%m-%d).sql
            env:
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-secret
                  key: password
            volumeMounts:
            - name: backup-volume
              mountPath: /backup
          restartPolicy: OnFailure
          volumes:
          - name: backup-volume
            persistentVolumeClaim:
              claimName: postgres-backup-pvc

You'll need to create a separate PVC for the backups:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-backup-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi

Database High Availability with StatefulSets

For production environments, you might want to deploy a database cluster for high availability. Let's look at how to set up a PostgreSQL cluster with a primary and replica:

This requires more advanced configurations and often custom operators. For PostgreSQL, you might consider using operators like:

PostgreSQL Operator by Zalando
Crunchy Data PostgreSQL Operator
KubeDB

These operators handle the complexity of setting up replication, failover, and backups.

Common Database Options for Kubernetes

While we've focused on PostgreSQL, there are several databases well-suited for Kubernetes:

PostgreSQL: Robust, feature-rich relational database
MySQL/MariaDB: Popular open-source relational databases
MongoDB: Document-oriented NoSQL database
Redis: In-memory data structure store
Elasticsearch: Distributed search and analytics engine
Cassandra: Wide-column NoSQL database designed for high availability

Each has its own deployment patterns and considerations in Kubernetes.

Best Practices for Kubernetes Databases

To ensure reliable database operations on Kubernetes:

Use a Dedicated Node Pool: Consider running databases on dedicated nodes with SSD storage
Set Resource Limits: Configure appropriate CPU and memory limits for your database pods
Use Pod Disruption Budgets: Prevent voluntary disruptions with PDBs
Implement Robust Monitoring: Set up monitoring for database metrics
Set Up Regular Backups: Automated, tested backup strategies are essential
Consider Using Database Operators: They handle many complexities of database management
Use Local Storage for Performance: For I/O intensive workloads, local SSDs can provide better performance
Implement Proper Security: Secure your database with appropriate network policies

Example: A Complete Database Deployment

Let's put everything together in a real-world example of a production-ready PostgreSQL deployment:

# Create namespace
apiVersion: v1
kind: Namespace
metadata:
  name: database

---
# Create StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-storage
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
reclaimPolicy: Retain
allowVolumeExpansion: true

---
# Create Secret
apiVersion: v1
kind: Secret
metadata:
  name: postgres-secret
  namespace: database
type: Opaque
data:
  password: UGFzc3dvcmQxMjM0NQ==  # base64 encoded 'Password12345'
  username: cG9zdGdyZXM=  # base64 encoded 'postgres'

---
# Create ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: postgres-config
  namespace: database
data:
  POSTGRES_DB: "applicationdb"
  postgresql.conf: |
    max_connections = 100
    shared_buffers = 256MB
    effective_cache_size = 768MB
    maintenance_work_mem = 64MB
    checkpoint_completion_target = 0.9
    wal_buffers = 7864kB
    default_statistics_target = 100
    random_page_cost = 1.1
    effective_io_concurrency = 200
    work_mem = 2621kB
    min_wal_size = 1GB
    max_wal_size = 4GB

---
# Create Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: postgres-pdb
  namespace: database
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      app: postgres

---
# Create StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
  namespace: database
spec:
  serviceName: "postgres"
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      terminationGracePeriodSeconds: 30
      initContainers:
      - name: take-data-dir-ownership
        image: busybox
        command: ["sh", "-c", "chown -R 999:999 /var/lib/postgresql/data"]
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
      containers:
      - name: postgres
        image: postgres:13
        ports:
        - containerPort: 5432
          name: postgres
        envFrom:
        - configMapRef:
            name: postgres-config
        env:
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password
        - name: POSTGRES_USER
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: username
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        volumeMounts:
        - name: postgres-data
          mountPath: /var/lib/postgresql/data
          subPath: pgdata
        - name: postgres-config-volume
          mountPath: /etc/postgresql/postgresql.conf
          subPath: postgresql.conf
        livenessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - exec pg_isready -U postgres -h localhost
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
        readinessProbe:
          exec:
            command:
            - /bin/sh
            - -c
            - exec pg_isready -U postgres -h localhost
          initialDelaySeconds: 5
          periodSeconds: 10
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
      volumes:
      - name: postgres-config-volume
        configMap:
          name: postgres-config
  volumeClaimTemplates:
  - metadata:
      name: postgres-data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: fast-storage
      resources:
        requests:
          storage: 20Gi

---
# Create Services
apiVersion: v1
kind: Service
metadata:
  name: postgres
  namespace: database
  labels:
    app: postgres
spec:
  ports:
  - port: 5432
    name: postgres
  clusterIP: None
  selector:
    app: postgres

---
apiVersion: v1
kind: Service
metadata:
  name: postgres-read
  namespace: database
  labels:
    app: postgres
spec:
  ports:
  - port: 5432
    name: postgres
  selector:
    app: postgres

This example includes:

Namespace isolation
SSD-based StorageClass
Secrets and ConfigMaps for configuration
Pod Disruption Budget for availability
Probes for health checking
Resource limits
Multiple services for different access patterns

Database Operators in Kubernetes

For more advanced database management, consider using a specialized database operator. Operators extend Kubernetes capabilities by automating tasks like:

Cluster setup and configuration
Automatic failover
Backups and restoration
Scaling operations
Version upgrades

For example, the Zalando Postgres Operator simplifies PostgreSQL cluster management:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: acid-postgresql-cluster
spec:
  teamId: "database-team"
  volume:
    size: 10Gi
  numberOfInstances: 3
  users:
    myapp:  # database owner role
    - superuser
    - createdb
    myreader: []  # regular user role
  databases:
    myappdb: myapp  # dbname: owner
  postgresql:
    version: "13"

This short definition creates a fully functional PostgreSQL cluster with replication.

Connecting Applications to Kubernetes Databases

To connect your applications to databases running in Kubernetes:

In-Cluster Access

For applications running in the same Kubernetes cluster:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:latest
        env:
        - name: DB_HOST
          value: "postgres.database.svc.cluster.local"
        - name: DB_PORT
          value: "5432"
        - name: DB_NAME
          value: "applicationdb"
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: username
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: postgres-secret
              key: password

External Access

For applications outside the cluster, you'll need to expose the database service:

apiVersion: v1
kind: Service
metadata:
  name: postgres-external
  namespace: database
spec:
  type: LoadBalancer
  ports:
  - port: 5432
    targetPort: 5432
    protocol: TCP
  selector:
    app: postgres

For better security, consider using a VPN or direct connect rather than exposing your database directly to the internet.

Monitoring Kubernetes Databases

Monitoring is crucial for database workloads. Set up monitoring with tools like Prometheus and Grafana:

Install the Prometheus Operator:

kubectl create namespace monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring

Deploy PostgreSQL Prometheus exporter:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres-exporter
  namespace: database
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres-exporter
  template:
    metadata:
      labels:
        app: postgres-exporter
    spec:
      containers:
      - name: postgres-exporter
        image: wrouesnel/postgres_exporter:latest
        env:
        - name: DATA_SOURCE_NAME
          value: "postgresql://postgres:Password12345@postgres:5432/postgres?sslmode=disable"
        ports:
        - containerPort: 9187
          name: metrics

Create a Service for the exporter:

apiVersion: v1
kind: Service
metadata:
  name: postgres-exporter
  namespace: database
  labels:
    app: postgres-exporter
spec:
  ports:
  - port: 9187
    name: metrics
  selector:
    app: postgres-exporter

Configure Prometheus to scrape metrics:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: postgres-monitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: postgres-exporter
  endpoints:
  - port: metrics
  namespaceSelector:
    matchNames:
    - database

Troubleshooting Kubernetes Databases

When your database in Kubernetes isn't working as expected:

Check Pod Status

kubectl get pods -n database
kubectl describe pod postgres-0 -n database

Check Logs

kubectl logs postgres-0 -n database

Check PersistentVolumes

kubectl get pv
kubectl get pvc -n database

Check Database Status

kubectl exec -it postgres-0 -n database -- psql -U postgres -c "SELECT version();"

Common Issues:

Cannot create PV/PVC: Check your StorageClass and ensure your cluster supports the requested storage type
Pod in CrashLoopBackOff: Check logs; often caused by incorrect credentials or permissions
Database connection failures: Verify service names and network policies
Performance issues: Check resource allocation and tune database parameters

Summary

Running databases on Kubernetes offers a unified approach to managing both stateless and stateful workloads. We've covered:

Setting up persistent storage for databases
Deploying single-instance databases using StatefulSets
Configuring high availability with multiple replicas
Implementing backups and security
Using database operators for complex deployments
Monitoring and troubleshooting techniques

While Kubernetes adds complexity to database management, it provides powerful tools for automation, scalability, and portability. As you become more comfortable with these concepts, you'll be able to build robust, cloud-native applications with properly managed databases.

Additional Resources

Official Documentation:
- Kubernetes StatefulSets
- Kubernetes Persistent Volumes
Database Operators:
Tutorials:
- Kubernetes Academy: StatefulSets
- Kubernetes Patterns for Database Management

Exercises

Deploy a single-instance PostgreSQL database using the YAML examples provided in this guide.
Create a simple application that connects to your database and performs CRUD operations.
Set up a regular backup schedule using CronJobs.
Install a database monitoring solution with Prometheus and Grafana.
Experiment with scaling your database by changing the replicas in the StatefulSet (and observe the challenges).
Try deploying a different type of database (e.g., MongoDB or MySQL) and note the differences in configuration.
Implement a high-availability setup using a database operator.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Prerequisites​

Why Run Databases on Kubernetes?​

Understanding StatefulSets​

Persistent Storage for Databases​

PersistentVolumes (PV)​

PersistentVolumeClaims (PVC)​

StorageClasses​

Deploying a Single Instance Database​

Step 1: Create a Secret for Database Credentials​

Step 2: Create a ConfigMap for Database Configuration​

Step 3: Deploy PostgreSQL using a StatefulSet​

Step 4: Create a Service for Database Access​

Testing Your Database Deployment​

Managing Database Backups​

Database High Availability with StatefulSets​

Common Database Options for Kubernetes​

Best Practices for Kubernetes Databases​

Example: A Complete Database Deployment​

Database Operators in Kubernetes​

Connecting Applications to Kubernetes Databases​

In-Cluster Access​

External Access​

Monitoring Kubernetes Databases​

Troubleshooting Kubernetes Databases​

Check Pod Status​

Check Logs​

Check PersistentVolumes​

Check Database Status​

Common Issues:​

Summary​

Additional Resources​

Exercises​