Skip to main content

Kubernetes Database

Introduction

When building applications on Kubernetes, data persistence is often one of the most challenging aspects to manage. While Kubernetes excels at orchestrating stateless applications, databases and other stateful services require special handling. This guide explores how to effectively deploy and manage databases in Kubernetes environments.

Kubernetes provides several resources that make database deployments possible:

  • StatefulSets: For maintaining stable, unique network identifiers and persistent storage
  • PersistentVolumes: For data persistence beyond the lifecycle of a pod
  • ConfigMaps and Secrets: For managing database configuration and credentials
  • Services: For stable networking and service discovery

By the end of this guide, you'll understand how to deploy databases on Kubernetes, configure persistent storage, manage database credentials securely, and implement best practices for production-grade database deployments.

Prerequisites

Before proceeding, you should have:

  • A basic understanding of Kubernetes concepts (Pods, Deployments, Services)
  • Access to a Kubernetes cluster (local like Minikube or cloud-based)
  • kubectl command-line tool installed and configured
  • Basic familiarity with database concepts

Why Run Databases on Kubernetes?

You might wonder why you should run databases on Kubernetes when many cloud providers offer managed database services. Here are some key benefits:

  1. Unified Infrastructure: Keep your entire application stack, including databases, in a single platform
  2. Consistent Deployment Patterns: Use the same CI/CD pipelines and tooling for all components
  3. Portability: Move between cloud providers or on-premise infrastructures more easily
  4. Cost Optimization: Potentially reduce costs by leveraging existing Kubernetes infrastructure
  5. Advanced Orchestration: Utilize Kubernetes' self-healing, scaling, and monitoring capabilities

However, it's important to understand the trade-offs. Managing stateful workloads on Kubernetes requires more planning and expertise than stateless applications.

Understanding StatefulSets

Unlike Deployments, which are designed for stateless applications, StatefulSets are specifically designed for applications that require:

  • Stable, unique network identifiers
  • Stable, persistent storage
  • Ordered, graceful deployment and scaling

These characteristics make StatefulSets ideal for database workloads. Let's see how a basic StatefulSet definition looks:

yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: "postgres"
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13
ports:
- containerPort: 5432
name: postgres
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi

Notice the volumeClaimTemplates section, which automatically creates PersistentVolumeClaims for each Pod in the StatefulSet.

Persistent Storage for Databases

For databases to function properly in Kubernetes, you need to set up persistent storage correctly. Let's explore the key components:

PersistentVolumes (PV)

A PersistentVolume is a piece of storage in the cluster provisioned by an administrator or dynamically provisioned using Storage Classes.

yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
hostPath:
path: "/mnt/data"

PersistentVolumeClaims (PVC)

A PersistentVolumeClaim is a request for storage by a user that can be fulfilled by a PersistentVolume.

yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard

StorageClasses

StorageClasses allow dynamic provisioning of PersistentVolumes when PVCs are created.

yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true

Deploying a Single Instance Database

Let's walk through deploying a PostgreSQL database on Kubernetes, step by step:

Step 1: Create a Secret for Database Credentials

First, create a Secret to store database credentials:

yaml
apiVersion: v1
kind: Secret
metadata:
name: postgres-secret
type: Opaque
data:
password: cG9zdGdyZXNwYXNzd29yZA== # base64 encoded 'postgrespassword'
username: cG9zdGdyZXM= # base64 encoded 'postgres'

You can create this Secret using:

bash
kubectl apply -f postgres-secret.yaml

Step 2: Create a ConfigMap for Database Configuration

For database configuration parameters:

yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
data:
POSTGRES_DB: "myapp"
POSTGRES_INITDB_ARGS: "--data-checksums"

Apply the ConfigMap:

bash
kubectl apply -f postgres-config.yaml

Step 3: Deploy PostgreSQL using a StatefulSet

Now create the StatefulSet:

yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: "postgres"
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13
ports:
- containerPort: 5432
name: postgres
envFrom:
- configMapRef:
name: postgres-config
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
subPath: postgres
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 10Gi

Apply the StatefulSet:

bash
kubectl apply -f postgres-statefulset.yaml

Step 4: Create a Service for Database Access

Finally, create a Service to make the database accessible:

yaml
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
ports:
- port: 5432
name: postgres
clusterIP: None
selector:
app: postgres

Apply the Service:

bash
kubectl apply -f postgres-service.yaml

This creates a headless service (with clusterIP: None) that allows direct addressing of the StatefulSet pods.

Testing Your Database Deployment

To verify your database deployment is working correctly:

bash
# Check the status of the StatefulSet
kubectl get statefulset postgres

# Check the running pod
kubectl get pods -l app=postgres

# Check persistent volume claims
kubectl get pvc

# Connect to the PostgreSQL pod
kubectl exec -it postgres-0 -- psql -U postgres

Inside the PostgreSQL shell, you can create a table and insert data:

sql
CREATE TABLE test (id serial PRIMARY KEY, name VARCHAR(50));
INSERT INTO test (name) VALUES ('Kubernetes Database Test');
SELECT * FROM test;

This confirms your database is working and data is being persisted.

Managing Database Backups

Regular backups are critical for database workloads. Here's a simple approach to create a backup job:

yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 1 * * *" # Run at 1 AM every day
jobTemplate:
spec:
template:
spec:
containers:
- name: postgres-backup
image: postgres:13
command:
- /bin/bash
- -c
- |
pg_dump -h postgres -U postgres -d myapp > /backup/myapp-$(date +%Y-%m-%d).sql
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: backup-volume
mountPath: /backup
restartPolicy: OnFailure
volumes:
- name: backup-volume
persistentVolumeClaim:
claimName: postgres-backup-pvc

You'll need to create a separate PVC for the backups:

yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-backup-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi

Database High Availability with StatefulSets

For production environments, you might want to deploy a database cluster for high availability. Let's look at how to set up a PostgreSQL cluster with a primary and replica:

This requires more advanced configurations and often custom operators. For PostgreSQL, you might consider using operators like:

  • PostgreSQL Operator by Zalando
  • Crunchy Data PostgreSQL Operator
  • KubeDB

These operators handle the complexity of setting up replication, failover, and backups.

Common Database Options for Kubernetes

While we've focused on PostgreSQL, there are several databases well-suited for Kubernetes:

  1. PostgreSQL: Robust, feature-rich relational database
  2. MySQL/MariaDB: Popular open-source relational databases
  3. MongoDB: Document-oriented NoSQL database
  4. Redis: In-memory data structure store
  5. Elasticsearch: Distributed search and analytics engine
  6. Cassandra: Wide-column NoSQL database designed for high availability

Each has its own deployment patterns and considerations in Kubernetes.

Best Practices for Kubernetes Databases

To ensure reliable database operations on Kubernetes:

  1. Use a Dedicated Node Pool: Consider running databases on dedicated nodes with SSD storage
  2. Set Resource Limits: Configure appropriate CPU and memory limits for your database pods
  3. Use Pod Disruption Budgets: Prevent voluntary disruptions with PDBs
  4. Implement Robust Monitoring: Set up monitoring for database metrics
  5. Set Up Regular Backups: Automated, tested backup strategies are essential
  6. Consider Using Database Operators: They handle many complexities of database management
  7. Use Local Storage for Performance: For I/O intensive workloads, local SSDs can provide better performance
  8. Implement Proper Security: Secure your database with appropriate network policies

Example: A Complete Database Deployment

Let's put everything together in a real-world example of a production-ready PostgreSQL deployment:

yaml
# Create namespace
apiVersion: v1
kind: Namespace
metadata:
name: database

---
# Create StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-storage
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
reclaimPolicy: Retain
allowVolumeExpansion: true

---
# Create Secret
apiVersion: v1
kind: Secret
metadata:
name: postgres-secret
namespace: database
type: Opaque
data:
password: UGFzc3dvcmQxMjM0NQ== # base64 encoded 'Password12345'
username: cG9zdGdyZXM= # base64 encoded 'postgres'

---
# Create ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
namespace: database
data:
POSTGRES_DB: "applicationdb"
postgresql.conf: |
max_connections = 100
shared_buffers = 256MB
effective_cache_size = 768MB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 7864kB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 2621kB
min_wal_size = 1GB
max_wal_size = 4GB

---
# Create Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
namespace: database
spec:
maxUnavailable: 1
selector:
matchLabels:
app: postgres

---
# Create StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: database
spec:
serviceName: "postgres"
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
terminationGracePeriodSeconds: 30
initContainers:
- name: take-data-dir-ownership
image: busybox
command: ["sh", "-c", "chown -R 999:999 /var/lib/postgresql/data"]
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
containers:
- name: postgres
image: postgres:13
ports:
- containerPort: 5432
name: postgres
envFrom:
- configMapRef:
name: postgres-config
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
subPath: pgdata
- name: postgres-config-volume
mountPath: /etc/postgresql/postgresql.conf
subPath: postgresql.conf
livenessProbe:
exec:
command:
- /bin/sh
- -c
- exec pg_isready -U postgres -h localhost
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- exec pg_isready -U postgres -h localhost
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
volumes:
- name: postgres-config-volume
configMap:
name: postgres-config
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-storage
resources:
requests:
storage: 20Gi

---
# Create Services
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: database
labels:
app: postgres
spec:
ports:
- port: 5432
name: postgres
clusterIP: None
selector:
app: postgres

---
apiVersion: v1
kind: Service
metadata:
name: postgres-read
namespace: database
labels:
app: postgres
spec:
ports:
- port: 5432
name: postgres
selector:
app: postgres

This example includes:

  • Namespace isolation
  • SSD-based StorageClass
  • Secrets and ConfigMaps for configuration
  • Pod Disruption Budget for availability
  • Probes for health checking
  • Resource limits
  • Multiple services for different access patterns

Database Operators in Kubernetes

For more advanced database management, consider using a specialized database operator. Operators extend Kubernetes capabilities by automating tasks like:

  • Cluster setup and configuration
  • Automatic failover
  • Backups and restoration
  • Scaling operations
  • Version upgrades

For example, the Zalando Postgres Operator simplifies PostgreSQL cluster management:

yaml
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: acid-postgresql-cluster
spec:
teamId: "database-team"
volume:
size: 10Gi
numberOfInstances: 3
users:
myapp: # database owner role
- superuser
- createdb
myreader: [] # regular user role
databases:
myappdb: myapp # dbname: owner
postgresql:
version: "13"

This short definition creates a fully functional PostgreSQL cluster with replication.

Connecting Applications to Kubernetes Databases

To connect your applications to databases running in Kubernetes:

In-Cluster Access

For applications running in the same Kubernetes cluster:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:latest
env:
- name: DB_HOST
value: "postgres.database.svc.cluster.local"
- name: DB_PORT
value: "5432"
- name: DB_NAME
value: "applicationdb"
- name: DB_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password

External Access

For applications outside the cluster, you'll need to expose the database service:

yaml
apiVersion: v1
kind: Service
metadata:
name: postgres-external
namespace: database
spec:
type: LoadBalancer
ports:
- port: 5432
targetPort: 5432
protocol: TCP
selector:
app: postgres

For better security, consider using a VPN or direct connect rather than exposing your database directly to the internet.

Monitoring Kubernetes Databases

Monitoring is crucial for database workloads. Set up monitoring with tools like Prometheus and Grafana:

  1. Install the Prometheus Operator:
bash
kubectl create namespace monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring
  1. Deploy PostgreSQL Prometheus exporter:
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres-exporter
namespace: database
spec:
replicas: 1
selector:
matchLabels:
app: postgres-exporter
template:
metadata:
labels:
app: postgres-exporter
spec:
containers:
- name: postgres-exporter
image: wrouesnel/postgres_exporter:latest
env:
- name: DATA_SOURCE_NAME
value: "postgresql://postgres:Password12345@postgres:5432/postgres?sslmode=disable"
ports:
- containerPort: 9187
name: metrics
  1. Create a Service for the exporter:
yaml
apiVersion: v1
kind: Service
metadata:
name: postgres-exporter
namespace: database
labels:
app: postgres-exporter
spec:
ports:
- port: 9187
name: metrics
selector:
app: postgres-exporter
  1. Configure Prometheus to scrape metrics:
yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: postgres-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: postgres-exporter
endpoints:
- port: metrics
namespaceSelector:
matchNames:
- database

Troubleshooting Kubernetes Databases

When your database in Kubernetes isn't working as expected:

Check Pod Status

bash
kubectl get pods -n database
kubectl describe pod postgres-0 -n database

Check Logs

bash
kubectl logs postgres-0 -n database

Check PersistentVolumes

bash
kubectl get pv
kubectl get pvc -n database

Check Database Status

bash
kubectl exec -it postgres-0 -n database -- psql -U postgres -c "SELECT version();"

Common Issues:

  1. Cannot create PV/PVC: Check your StorageClass and ensure your cluster supports the requested storage type
  2. Pod in CrashLoopBackOff: Check logs; often caused by incorrect credentials or permissions
  3. Database connection failures: Verify service names and network policies
  4. Performance issues: Check resource allocation and tune database parameters

Summary

Running databases on Kubernetes offers a unified approach to managing both stateless and stateful workloads. We've covered:

  • Setting up persistent storage for databases
  • Deploying single-instance databases using StatefulSets
  • Configuring high availability with multiple replicas
  • Implementing backups and security
  • Using database operators for complex deployments
  • Monitoring and troubleshooting techniques

While Kubernetes adds complexity to database management, it provides powerful tools for automation, scalability, and portability. As you become more comfortable with these concepts, you'll be able to build robust, cloud-native applications with properly managed databases.

Additional Resources

  1. Official Documentation:

  2. Database Operators:

  3. Tutorials:

Exercises

  1. Deploy a single-instance PostgreSQL database using the YAML examples provided in this guide.
  2. Create a simple application that connects to your database and performs CRUD operations.
  3. Set up a regular backup schedule using CronJobs.
  4. Install a database monitoring solution with Prometheus and Grafana.
  5. Experiment with scaling your database by changing the replicas in the StatefulSet (and observe the challenges).
  6. Try deploying a different type of database (e.g., MongoDB or MySQL) and note the differences in configuration.
  7. Implement a high-availability setup using a database operator.


If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)