Kubernetes Database
Introduction
When building applications on Kubernetes, data persistence is often one of the most challenging aspects to manage. While Kubernetes excels at orchestrating stateless applications, databases and other stateful services require special handling. This guide explores how to effectively deploy and manage databases in Kubernetes environments.
Kubernetes provides several resources that make database deployments possible:
- StatefulSets: For maintaining stable, unique network identifiers and persistent storage
- PersistentVolumes: For data persistence beyond the lifecycle of a pod
- ConfigMaps and Secrets: For managing database configuration and credentials
- Services: For stable networking and service discovery
By the end of this guide, you'll understand how to deploy databases on Kubernetes, configure persistent storage, manage database credentials securely, and implement best practices for production-grade database deployments.
Prerequisites
Before proceeding, you should have:
- A basic understanding of Kubernetes concepts (Pods, Deployments, Services)
- Access to a Kubernetes cluster (local like Minikube or cloud-based)
kubectl
command-line tool installed and configured- Basic familiarity with database concepts
Why Run Databases on Kubernetes?
You might wonder why you should run databases on Kubernetes when many cloud providers offer managed database services. Here are some key benefits:
- Unified Infrastructure: Keep your entire application stack, including databases, in a single platform
- Consistent Deployment Patterns: Use the same CI/CD pipelines and tooling for all components
- Portability: Move between cloud providers or on-premise infrastructures more easily
- Cost Optimization: Potentially reduce costs by leveraging existing Kubernetes infrastructure
- Advanced Orchestration: Utilize Kubernetes' self-healing, scaling, and monitoring capabilities
However, it's important to understand the trade-offs. Managing stateful workloads on Kubernetes requires more planning and expertise than stateless applications.
Understanding StatefulSets
Unlike Deployments, which are designed for stateless applications, StatefulSets are specifically designed for applications that require:
- Stable, unique network identifiers
- Stable, persistent storage
- Ordered, graceful deployment and scaling
These characteristics make StatefulSets ideal for database workloads. Let's see how a basic StatefulSet definition looks:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: "postgres"
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13
ports:
- containerPort: 5432
name: postgres
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
Notice the volumeClaimTemplates
section, which automatically creates PersistentVolumeClaims for each Pod in the StatefulSet.
Persistent Storage for Databases
For databases to function properly in Kubernetes, you need to set up persistent storage correctly. Let's explore the key components:
PersistentVolumes (PV)
A PersistentVolume is a piece of storage in the cluster provisioned by an administrator or dynamically provisioned using Storage Classes.
apiVersion: v1
kind: PersistentVolume
metadata:
name: postgres-pv
spec:
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: standard
hostPath:
path: "/mnt/data"
PersistentVolumeClaims (PVC)
A PersistentVolumeClaim is a request for storage by a user that can be fulfilled by a PersistentVolume.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: standard
StorageClasses
StorageClasses allow dynamic provisioning of PersistentVolumes when PVCs are created.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp2
reclaimPolicy: Retain
allowVolumeExpansion: true
Deploying a Single Instance Database
Let's walk through deploying a PostgreSQL database on Kubernetes, step by step:
Step 1: Create a Secret for Database Credentials
First, create a Secret to store database credentials:
apiVersion: v1
kind: Secret
metadata:
name: postgres-secret
type: Opaque
data:
password: cG9zdGdyZXNwYXNzd29yZA== # base64 encoded 'postgrespassword'
username: cG9zdGdyZXM= # base64 encoded 'postgres'
You can create this Secret using:
kubectl apply -f postgres-secret.yaml
Step 2: Create a ConfigMap for Database Configuration
For database configuration parameters:
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
data:
POSTGRES_DB: "myapp"
POSTGRES_INITDB_ARGS: "--data-checksums"
Apply the ConfigMap:
kubectl apply -f postgres-config.yaml
Step 3: Deploy PostgreSQL using a StatefulSet
Now create the StatefulSet:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: "postgres"
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:13
ports:
- containerPort: 5432
name: postgres
envFrom:
- configMapRef:
name: postgres-config
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
subPath: postgres
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "1Gi"
cpu: "500m"
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 10Gi
Apply the StatefulSet:
kubectl apply -f postgres-statefulset.yaml
Step 4: Create a Service for Database Access
Finally, create a Service to make the database accessible:
apiVersion: v1
kind: Service
metadata:
name: postgres
labels:
app: postgres
spec:
ports:
- port: 5432
name: postgres
clusterIP: None
selector:
app: postgres
Apply the Service:
kubectl apply -f postgres-service.yaml
This creates a headless service (with clusterIP: None
) that allows direct addressing of the StatefulSet pods.
Testing Your Database Deployment
To verify your database deployment is working correctly:
# Check the status of the StatefulSet
kubectl get statefulset postgres
# Check the running pod
kubectl get pods -l app=postgres
# Check persistent volume claims
kubectl get pvc
# Connect to the PostgreSQL pod
kubectl exec -it postgres-0 -- psql -U postgres
Inside the PostgreSQL shell, you can create a table and insert data:
CREATE TABLE test (id serial PRIMARY KEY, name VARCHAR(50));
INSERT INTO test (name) VALUES ('Kubernetes Database Test');
SELECT * FROM test;
This confirms your database is working and data is being persisted.
Managing Database Backups
Regular backups are critical for database workloads. Here's a simple approach to create a backup job:
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-backup
spec:
schedule: "0 1 * * *" # Run at 1 AM every day
jobTemplate:
spec:
template:
spec:
containers:
- name: postgres-backup
image: postgres:13
command:
- /bin/bash
- -c
- |
pg_dump -h postgres -U postgres -d myapp > /backup/myapp-$(date +%Y-%m-%d).sql
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
volumeMounts:
- name: backup-volume
mountPath: /backup
restartPolicy: OnFailure
volumes:
- name: backup-volume
persistentVolumeClaim:
claimName: postgres-backup-pvc
You'll need to create a separate PVC for the backups:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-backup-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
Database High Availability with StatefulSets
For production environments, you might want to deploy a database cluster for high availability. Let's look at how to set up a PostgreSQL cluster with a primary and replica:
This requires more advanced configurations and often custom operators. For PostgreSQL, you might consider using operators like:
- PostgreSQL Operator by Zalando
- Crunchy Data PostgreSQL Operator
- KubeDB
These operators handle the complexity of setting up replication, failover, and backups.
Common Database Options for Kubernetes
While we've focused on PostgreSQL, there are several databases well-suited for Kubernetes:
- PostgreSQL: Robust, feature-rich relational database
- MySQL/MariaDB: Popular open-source relational databases
- MongoDB: Document-oriented NoSQL database
- Redis: In-memory data structure store
- Elasticsearch: Distributed search and analytics engine
- Cassandra: Wide-column NoSQL database designed for high availability
Each has its own deployment patterns and considerations in Kubernetes.
Best Practices for Kubernetes Databases
To ensure reliable database operations on Kubernetes:
- Use a Dedicated Node Pool: Consider running databases on dedicated nodes with SSD storage
- Set Resource Limits: Configure appropriate CPU and memory limits for your database pods
- Use Pod Disruption Budgets: Prevent voluntary disruptions with PDBs
- Implement Robust Monitoring: Set up monitoring for database metrics
- Set Up Regular Backups: Automated, tested backup strategies are essential
- Consider Using Database Operators: They handle many complexities of database management
- Use Local Storage for Performance: For I/O intensive workloads, local SSDs can provide better performance
- Implement Proper Security: Secure your database with appropriate network policies
Example: A Complete Database Deployment
Let's put everything together in a real-world example of a production-ready PostgreSQL deployment:
# Create namespace
apiVersion: v1
kind: Namespace
metadata:
name: database
---
# Create StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-storage
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
reclaimPolicy: Retain
allowVolumeExpansion: true
---
# Create Secret
apiVersion: v1
kind: Secret
metadata:
name: postgres-secret
namespace: database
type: Opaque
data:
password: UGFzc3dvcmQxMjM0NQ== # base64 encoded 'Password12345'
username: cG9zdGdyZXM= # base64 encoded 'postgres'
---
# Create ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: postgres-config
namespace: database
data:
POSTGRES_DB: "applicationdb"
postgresql.conf: |
max_connections = 100
shared_buffers = 256MB
effective_cache_size = 768MB
maintenance_work_mem = 64MB
checkpoint_completion_target = 0.9
wal_buffers = 7864kB
default_statistics_target = 100
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 2621kB
min_wal_size = 1GB
max_wal_size = 4GB
---
# Create Pod Disruption Budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: postgres-pdb
namespace: database
spec:
maxUnavailable: 1
selector:
matchLabels:
app: postgres
---
# Create StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: database
spec:
serviceName: "postgres"
replicas: 3
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
terminationGracePeriodSeconds: 30
initContainers:
- name: take-data-dir-ownership
image: busybox
command: ["sh", "-c", "chown -R 999:999 /var/lib/postgresql/data"]
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
containers:
- name: postgres
image: postgres:13
ports:
- containerPort: 5432
name: postgres
envFrom:
- configMapRef:
name: postgres-config
env:
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- name: postgres-data
mountPath: /var/lib/postgresql/data
subPath: pgdata
- name: postgres-config-volume
mountPath: /etc/postgresql/postgresql.conf
subPath: postgresql.conf
livenessProbe:
exec:
command:
- /bin/sh
- -c
- exec pg_isready -U postgres -h localhost
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
readinessProbe:
exec:
command:
- /bin/sh
- -c
- exec pg_isready -U postgres -h localhost
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
volumes:
- name: postgres-config-volume
configMap:
name: postgres-config
volumeClaimTemplates:
- metadata:
name: postgres-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast-storage
resources:
requests:
storage: 20Gi
---
# Create Services
apiVersion: v1
kind: Service
metadata:
name: postgres
namespace: database
labels:
app: postgres
spec:
ports:
- port: 5432
name: postgres
clusterIP: None
selector:
app: postgres
---
apiVersion: v1
kind: Service
metadata:
name: postgres-read
namespace: database
labels:
app: postgres
spec:
ports:
- port: 5432
name: postgres
selector:
app: postgres
This example includes:
- Namespace isolation
- SSD-based StorageClass
- Secrets and ConfigMaps for configuration
- Pod Disruption Budget for availability
- Probes for health checking
- Resource limits
- Multiple services for different access patterns
Database Operators in Kubernetes
For more advanced database management, consider using a specialized database operator. Operators extend Kubernetes capabilities by automating tasks like:
- Cluster setup and configuration
- Automatic failover
- Backups and restoration
- Scaling operations
- Version upgrades
For example, the Zalando Postgres Operator simplifies PostgreSQL cluster management:
apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
name: acid-postgresql-cluster
spec:
teamId: "database-team"
volume:
size: 10Gi
numberOfInstances: 3
users:
myapp: # database owner role
- superuser
- createdb
myreader: [] # regular user role
databases:
myappdb: myapp # dbname: owner
postgresql:
version: "13"
This short definition creates a fully functional PostgreSQL cluster with replication.
Connecting Applications to Kubernetes Databases
To connect your applications to databases running in Kubernetes:
In-Cluster Access
For applications running in the same Kubernetes cluster:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:latest
env:
- name: DB_HOST
value: "postgres.database.svc.cluster.local"
- name: DB_PORT
value: "5432"
- name: DB_NAME
value: "applicationdb"
- name: DB_USER
valueFrom:
secretKeyRef:
name: postgres-secret
key: username
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-secret
key: password
External Access
For applications outside the cluster, you'll need to expose the database service:
apiVersion: v1
kind: Service
metadata:
name: postgres-external
namespace: database
spec:
type: LoadBalancer
ports:
- port: 5432
targetPort: 5432
protocol: TCP
selector:
app: postgres
For better security, consider using a VPN or direct connect rather than exposing your database directly to the internet.
Monitoring Kubernetes Databases
Monitoring is crucial for database workloads. Set up monitoring with tools like Prometheus and Grafana:
- Install the Prometheus Operator:
kubectl create namespace monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring
- Deploy PostgreSQL Prometheus exporter:
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres-exporter
namespace: database
spec:
replicas: 1
selector:
matchLabels:
app: postgres-exporter
template:
metadata:
labels:
app: postgres-exporter
spec:
containers:
- name: postgres-exporter
image: wrouesnel/postgres_exporter:latest
env:
- name: DATA_SOURCE_NAME
value: "postgresql://postgres:Password12345@postgres:5432/postgres?sslmode=disable"
ports:
- containerPort: 9187
name: metrics
- Create a Service for the exporter:
apiVersion: v1
kind: Service
metadata:
name: postgres-exporter
namespace: database
labels:
app: postgres-exporter
spec:
ports:
- port: 9187
name: metrics
selector:
app: postgres-exporter
- Configure Prometheus to scrape metrics:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: postgres-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app: postgres-exporter
endpoints:
- port: metrics
namespaceSelector:
matchNames:
- database
Troubleshooting Kubernetes Databases
When your database in Kubernetes isn't working as expected:
Check Pod Status
kubectl get pods -n database
kubectl describe pod postgres-0 -n database
Check Logs
kubectl logs postgres-0 -n database
Check PersistentVolumes
kubectl get pv
kubectl get pvc -n database
Check Database Status
kubectl exec -it postgres-0 -n database -- psql -U postgres -c "SELECT version();"
Common Issues:
- Cannot create PV/PVC: Check your StorageClass and ensure your cluster supports the requested storage type
- Pod in CrashLoopBackOff: Check logs; often caused by incorrect credentials or permissions
- Database connection failures: Verify service names and network policies
- Performance issues: Check resource allocation and tune database parameters
Summary
Running databases on Kubernetes offers a unified approach to managing both stateless and stateful workloads. We've covered:
- Setting up persistent storage for databases
- Deploying single-instance databases using StatefulSets
- Configuring high availability with multiple replicas
- Implementing backups and security
- Using database operators for complex deployments
- Monitoring and troubleshooting techniques
While Kubernetes adds complexity to database management, it provides powerful tools for automation, scalability, and portability. As you become more comfortable with these concepts, you'll be able to build robust, cloud-native applications with properly managed databases.
Additional Resources
-
Official Documentation:
-
Database Operators:
-
Tutorials:
Exercises
- Deploy a single-instance PostgreSQL database using the YAML examples provided in this guide.
- Create a simple application that connects to your database and performs CRUD operations.
- Set up a regular backup schedule using CronJobs.
- Install a database monitoring solution with Prometheus and Grafana.
- Experiment with scaling your database by changing the replicas in the StatefulSet (and observe the challenges).
- Try deploying a different type of database (e.g., MongoDB or MySQL) and note the differences in configuration.
- Implement a high-availability setup using a database operator.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)