Kubernetes Production Deployment
Introduction
Deploying Kubernetes in a production environment requires careful planning, thorough understanding of best practices, and attention to security and scalability concerns. This guide walks you through the process of taking your Kubernetes cluster from development to a production-ready state, ensuring reliability, security, and maintainability.
Kubernetes has become the industry standard for container orchestration, but a production deployment introduces many complexities beyond what you might have encountered in development environments. In this guide, we'll explore how to properly set up, secure, and maintain a production-grade Kubernetes deployment.
Prerequisites
Before starting a production deployment, you should have:
- Basic understanding of Kubernetes concepts (pods, services, deployments)
- Experience with kubectl command-line tool
- Familiarity with container concepts and Docker
- Access to a cloud provider (AWS, GCP, Azure) or bare metal servers
- Domain knowledge of your application requirements
Architecture Planning
Cluster Architecture
A production Kubernetes architecture typically includes:
-
Control Plane (formerly called master nodes)
- API Server
- Scheduler
- Controller Manager
- etcd (distributed key-value store)
-
Worker Nodes
- kubelet
- kube-proxy
- Container runtime (Docker, containerd, CRI-O)
Let's visualize this architecture:
High Availability Considerations
For production deployments, high availability is crucial:
- Deploy multiple control plane nodes (at least 3)
- Use an odd number of etcd instances (3, 5, 7) for quorum-based consensus
- Distribute nodes across availability zones/regions
- Implement proper backup and disaster recovery strategies
Choosing a Deployment Method
Several options exist for deploying production Kubernetes:
1. Managed Kubernetes Services
Cloud providers offer managed Kubernetes services that handle control plane management:
- Amazon EKS (Elastic Kubernetes Service)
- Google GKE (Google Kubernetes Engine)
- Microsoft AKS (Azure Kubernetes Service)
- Digital Ocean Kubernetes
Managed services reduce operational overhead but may have limitations in customization.
2. Self-Managed Deployment Tools
For more control or on-premises deployments:
- kops (Kubernetes Operations)
- kubespray (based on Ansible)
- kubeadm (official Kubernetes bootstrapping tool)
- RKE (Rancher Kubernetes Engine)
Environment Setup Example
Let's look at setting up a production cluster using kubeadm. This example demonstrates the core concepts, though your specific setup might vary.
1. Prepare Your Infrastructure
First, ensure your nodes meet the minimum requirements:
- Control plane: 2 CPUs, 2GB RAM
- Worker nodes: 1 CPU, 2GB RAM
- All nodes need proper networking and fully qualified domain names
2. Initialize the Control Plane
# Install required packages
apt-get update && apt-get install -y apt-transport-https ca-certificates curl
# Add Kubernetes repository
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF | tee /etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
# Install kubelet, kubeadm, and kubectl
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
# Initialize the control plane
kubeadm init --control-plane-endpoint="k8s-control.example.com:6443" \
--upload-certs \
--pod-network-cidr=192.168.0.0/16
The output will include commands to:
- Set up kubectl access for the admin user
- Deploy a pod network
- Join additional control plane nodes
- Join worker nodes
3. Install a Container Network Interface (CNI)
# For Calico CNI
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
4. Join Worker Nodes
On each worker node, run the join command output from the kubeadm init:
kubeadm join k8s-control.example.com:6443 \
--token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:1234...cdef
Production Deployment Best Practices
Resource Management
Control resource allocation with requests and limits:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
spec:
replicas: 3
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: app-container
image: example/app:1.0.0
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
Use Namespaces for Logical Separation
Create separate namespaces for different environments or teams:
kubectl create namespace production
kubectl create namespace staging
kubectl create namespace monitoring
Implement Health Checks
Add liveness and readiness probes to ensure proper application health:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
spec:
replicas: 3
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: app-container
image: example/app:1.0.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 3
periodSeconds: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Security Best Practices
RBAC (Role-Based Access Control)
Implement proper RBAC to limit access to cluster resources:
# Create a role
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: app-reader
rules:
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "list", "watch"]
---
# Bind the role to a user or group
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: read-pods
namespace: production
subjects:
- kind: User
name: jane
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: Role
name: app-reader
apiGroup: rbac.authorization.k8s.io
Network Policies
Restrict pod-to-pod communication:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-allow
namespace: production
spec:
podSelector:
matchLabels:
app: api
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
Secret Management
Store sensitive information securely:
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
data:
db-password: cGFzc3dvcmQxMjM= # base64 encoded
api-key: YWJjMTIzZGVmNDU2 # base64 encoded
And reference them in your deployments:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app
spec:
replicas: 3
selector:
matchLabels:
app: example-app
template:
metadata:
labels:
app: example-app
spec:
containers:
- name: app-container
image: example/app:1.0.0
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets
key: db-password
For production, consider using external secret management solutions like:
- HashiCorp Vault
- AWS Secrets Manager
- Azure Key Vault
- Google Secret Manager
Monitoring and Logging
Prometheus and Grafana Stack
Deploy Prometheus for metrics collection and Grafana for visualization:
# Using Helm to install Prometheus and Grafana
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace
Centralized Logging
Set up an EFK (Elasticsearch, Fluentd, Kibana) or ELK (Elasticsearch, Logstash, Kibana) stack:
# Using Helm to install EFK stack
helm repo add elastic https://helm.elastic.co
helm repo update
helm install elasticsearch elastic/elasticsearch --namespace logging --create-namespace
helm install kibana elastic/kibana --namespace logging
helm install fluentd stable/fluentd-elasticsearch --namespace logging
Scaling and Auto-Scaling
Horizontal Pod Autoscaler (HPA)
Automatically scale based on CPU or custom metrics:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: example-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: example-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
Cluster Autoscaler
For cloud-based deployments, implement cluster autoscaling to add/remove nodes:
# Example for AWS EKS
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
replicas: 1
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
spec:
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.21.0
name: cluster-autoscaler
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/my-cluster
CI/CD Integration
GitOps with ArgoCD
ArgoCD provides declarative, GitOps continuous delivery for Kubernetes:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: example-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourorg/example-app.git
targetRevision: HEAD
path: kubernetes/manifests
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
Implementing Blue-Green Deployments
A deployment strategy that minimizes downtime by running two identical environments:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app-blue
labels:
app: example-app
version: blue
spec:
replicas: 3
selector:
matchLabels:
app: example-app
version: blue
template:
metadata:
labels:
app: example-app
version: blue
spec:
containers:
- name: app-container
image: example/app:1.0.0
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: example-app-green
labels:
app: example-app
version: green
spec:
replicas: 0 # Initially zero
selector:
matchLabels:
app: example-app
version: green
template:
metadata:
labels:
app: example-app
version: green
spec:
containers:
- name: app-container
image: example/app:2.0.0
---
apiVersion: v1
kind: Service
metadata:
name: example-app
spec:
selector:
app: example-app
version: blue # Initially pointing to blue
ports:
- port: 80
targetPort: 8080
Disaster Recovery
Backup with Velero
Velero is an open-source tool to backup and restore Kubernetes resources and persistent volumes:
# Install Velero on an AWS cluster
velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.2.0 \
--bucket velero-backups \
--backup-location-config region=us-west-2 \
--snapshot-location-config region=us-west-2 \
--secret-file ./credentials-velero
Create scheduled backups:
# Create a daily backup of the production namespace
velero schedule create production-daily \
--schedule="0 1 * * *" \
--include-namespaces production
Testing Restoration
Regularly test your backup and restoration process:
# Create a restore from the latest backup
velero restore create --from-backup production-daily-20221201 \
--namespace-mappings production:production-test
Real-World Example: E-commerce Application Deployment
Let's walk through a complete example of deploying a microservices-based e-commerce application to production Kubernetes.
Infrastructure Setup
- Create a production namespace:
kubectl create namespace ecommerce-prod
- Create a ConfigMap for environment-specific settings:
apiVersion: v1
kind: ConfigMap
metadata:
name: ecommerce-config
namespace: ecommerce-prod
data:
API_URL: "https://api.example.com"
CACHE_TTL: "3600"
PAYMENT_GATEWAY: "production"
- Deploy database with StatefulSet:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgresql
namespace: ecommerce-prod
spec:
serviceName: "postgresql"
replicas: 1
selector:
matchLabels:
app: postgresql
template:
metadata:
labels:
app: postgresql
spec:
containers:
- name: postgresql
image: postgres:13
ports:
- containerPort: 5432
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: db-secrets
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: db-secrets
key: password
volumeMounts:
- name: postgresql-data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: postgresql-data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "standard"
resources:
requests:
storage: 10Gi
- Deploy microservices:
# Frontend deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: ecommerce-prod
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: example/ecommerce-frontend:1.2.0
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "300m"
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 10
periodSeconds: 5
---
# API service deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
namespace: ecommerce-prod
spec:
replicas: 5
selector:
matchLabels:
app: api-service
template:
metadata:
labels:
app: api-service
spec:
containers:
- name: api
image: example/ecommerce-api:1.1.0
ports:
- containerPort: 8080
env:
- name: DB_HOST
value: postgresql.ecommerce-prod.svc.cluster.local
- name: DB_USER
valueFrom:
secretKeyRef:
name: db-secrets
key: username
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secrets
key: password
- name: DB_NAME
value: ecommerce
envFrom:
- configMapRef:
name: ecommerce-config
- Create services:
# Frontend service with LoadBalancer
apiVersion: v1
kind: Service
metadata:
name: frontend
namespace: ecommerce-prod
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 80
selector:
app: frontend
---
# API service (internal)
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: ecommerce-prod
spec:
type: ClusterIP
ports:
- port: 8080
targetPort: 8080
selector:
app: api-service
---
# Database service
apiVersion: v1
kind: Service
metadata:
name: postgresql
namespace: ecommerce-prod
spec:
type: ClusterIP
ports:
- port: 5432
targetPort: 5432
selector:
app: postgresql
- Set up an ingress controller for TLS termination:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ecommerce-ingress
namespace: ecommerce-prod
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- shop.example.com
secretName: shop-tls
rules:
- host: shop.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 80
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080
Performance Testing
Before full production deployment, perform thorough load testing using tools like k6, JMeter, or Locust:
# Example k6 load test
k6 run --vus 100 --duration 5m loadtest.js
Maintenance and Updates
Rolling Updates
Kubernetes supports rolling updates by default:
# Update the API service to a new version
kubectl set image deployment/api-service api=example/ecommerce-api:1.2.0 -n ecommerce-prod
Handling Stateful Components
For database schema updates, you might need a more careful approach:
- Take a backup
- Apply schema migrations using jobs:
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
namespace: ecommerce-prod
spec:
template:
spec:
containers:
- name: migration
image: example/db-migration:1.0.0
env:
- name: DB_HOST
value: postgresql.ecommerce-prod.svc.cluster.local
# Add other environment variables as needed
restartPolicy: Never
backoffLimit: 4
Summary
Deploying Kubernetes in production involves careful planning, proper resource management, security hardening, and implementing monitoring and scaling strategies. The key takeaways from this guide include:
- Plan your architecture with high availability in mind
- Choose the right deployment method based on your needs and resources
- Implement security best practices including RBAC, network policies, and secure secret management
- Set up comprehensive monitoring and logging to detect and troubleshoot issues
- Configure auto-scaling to handle varying loads
- Establish a CI/CD pipeline for reliable deployments
- Implement disaster recovery procedures and test them regularly
By following these guidelines, you'll be well on your way to running a production-grade Kubernetes environment that is reliable, secure, and maintainable.
Additional Resources
Here are some resources to deepen your understanding:
- Kubernetes Documentation
- Kubernetes the Hard Way
- CNCF Landscape
- Books:
- "Kubernetes in Action" by Marko Lukša
- "Kubernetes Patterns" by Bilgin Ibryam and Roland Huß
- "Production Kubernetes" by Josh Rosso, Rich Lander, Alex Brand, and John Harris
Practice Exercises
- Set up a local multi-node Kubernetes cluster using minikube or kind
- Create a deployment strategy for a stateful application
- Implement a complete monitoring solution with Prometheus and Grafana
- Design and implement a disaster recovery plan
- Create a GitOps workflow using tools like ArgoCD or Flux
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)