Kubernetes Debugging
Introduction
Debugging applications in Kubernetes can be challenging due to the distributed nature of containerized applications. When something goes wrong, identifying the root cause requires understanding how to navigate Kubernetes resources, extract relevant information, and analyze system behavior. This guide will walk you through the essential debugging techniques every Kubernetes administrator should know.
Common Debugging Scenarios
Before diving into specific tools, let's understand some common scenarios that require debugging:
- Pods failing to start or run properly
- Services not routing traffic correctly
- Performance issues and resource constraints
- Configuration errors
- Networking problems between pods or services
Essential Kubectl Commands for Debugging
The kubectl
command-line tool is your primary interface for debugging Kubernetes clusters. Here are the most useful commands for troubleshooting:
Viewing Resource Information
# Get detailed information about a pod
kubectl describe pod <pod-name> -n <namespace>
# Get detailed information about a node
kubectl describe node <node-name>
# Get detailed information about a service
kubectl describe service <service-name> -n <namespace>
The describe
command provides rich information including events, which often contain clues about failures or issues.
Viewing Pod Logs
Logs are your window into application behavior:
# Get logs from a pod
kubectl logs <pod-name> -n <namespace>
# Get logs from a specific container in a multi-container pod
kubectl logs <pod-name> -c <container-name> -n <namespace>
# Follow logs in real-time
kubectl logs -f <pod-name> -n <namespace>
# Get logs from previous container instance (if it crashed)
kubectl logs <pod-name> --previous -n <namespace>
Example output:
2023-05-15T12:34:56.789Z INFO Starting application...
2023-05-15T12:34:57.123Z INFO Connected to database
2023-05-15T12:35:01.456Z ERROR Failed to process request: connection timeout
Executing Commands in Containers
Sometimes you need to run commands inside a container to debug issues:
# Run a command in a pod
kubectl exec -it <pod-name> -n <namespace> -- <command>
# Start a shell session in a pod
kubectl exec -it <pod-name> -n <namespace> -- /bin/bash
For example, to check network connectivity from inside a pod:
kubectl exec -it nginx-pod -- curl -v http://backend-service:8080
Checking Pod Status and Events
# Get pod status
kubectl get pods -n <namespace>
# Watch pod status changes in real-time
kubectl get pods -n <namespace> -w
# Get events sorted by timestamp
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
Debugging Pod Issues
Common Pod Status Meanings
Understanding pod status is crucial for troubleshooting:
Status | Meaning |
---|---|
Pending | Pod is scheduled but containers aren't running yet |
Running | Pod is running with all containers started |
Succeeded | All containers in the pod have terminated successfully |
Failed | At least one container has terminated with failure |
Unknown | Pod state cannot be determined |
CrashLoopBackOff | Container is crashing repeatedly |
ImagePullBackOff | Unable to pull the container image |
ContainerCreating | Containers are being created |
Terminating | Pod is being terminated |
Debugging Pod Startup Issues
When a pod is stuck in Pending
state:
- Check if the cluster has enough resources:
kubectl describe nodes | grep -A 5 "Allocated resources"
- Check if pod placement is restricted by node affinity, taints, or tolerations:
kubectl describe pod <pod-name> -n <namespace> | grep -A 10 "Tolerations:"
- Check events for scheduling issues:
kubectl get events -n <namespace> --field-selector involvedObject.name=<pod-name>
Debugging CrashLoopBackOff
When a pod is in CrashLoopBackOff
:
- Check container logs:
kubectl logs <pod-name> -n <namespace> --previous
- Check the container exit code:
kubectl describe pod <pod-name> -n <namespace>
Exit codes can provide hints about the problem:
Exit Code | Meaning |
---|---|
0 | Container exited successfully |
1 | General error |
137 | Container was killed (probably due to OOM) |
143 | Container received SIGTERM |
- If it's an application error, you may need to debug the application code or configuration.
Debugging Service and Networking Issues
DNS Troubleshooting
DNS issues are common in Kubernetes. To debug:
- Create a temporary debugging pod:
kubectl run debug-dns --image=busybox -it --rm -- nslookup <service-name>.<namespace>.svc.cluster.local
Example output:
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: my-service.default.svc.cluster.local
Address 1: 10.107.31.5
- Check if the service has endpoints:
kubectl get endpoints <service-name> -n <namespace>
Service Connectivity Testing
To test connectivity to a service:
- From another pod:
kubectl exec -it <some-pod> -- curl -v http://<service-name>:<port>
- Check if service selectors match pod labels:
# Check service selectors
kubectl describe service <service-name> -n <namespace> | grep Selector
# Check pod labels
kubectl get pod <pod-name> -n <namespace> --show-labels
Using Port-Forward for Debugging
The port-forward
command creates a secure tunnel to a pod or service for debugging:
# Forward a local port to a pod port
kubectl port-forward pod/<pod-name> <local-port>:<pod-port> -n <namespace>
# Forward a local port to a service port
kubectl port-forward svc/<service-name> <local-port>:<service-port> -n <namespace>
Example:
kubectl port-forward pod/my-app-pod 8080:80
This forwards your local port 8080 to port 80 on the pod, allowing you to access it via http://localhost:8080
.
Debugging with Custom Debugging Pods
Sometimes you need a dedicated debugging pod with specific tools:
apiVersion: v1
kind: Pod
metadata:
name: debugging-pod
spec:
containers:
- name: debugging-tools
image: nicolaka/netshoot
command:
- sleep
- "3600"
Save this as debug-pod.yaml
and create it:
kubectl apply -f debug-pod.yaml
This creates a pod with networking tools like dig
, curl
, tcpdump
, etc., that you can use for advanced debugging.
Debugging Resource Issues
CPU and Memory Usage
To check resource usage:
# Get resource usage per node
kubectl top nodes
# Get resource usage per pod
kubectl top pods -n <namespace>
Example output:
NAME CPU(cores) MEMORY(bytes)
pod-name-1 12m 45Mi
pod-name-2 1456m 231Mi
Analyzing Resource Constraints
If pods are being terminated due to resource constraints:
- Check if the pod is being OOM killed:
kubectl describe pod <pod-name> -n <namespace> | grep -i "killed"
- Check resource requests and limits:
kubectl describe pod <pod-name> -n <namespace> | grep -A 3 "Limits:"
Advanced Debugging Techniques
Using the Kubernetes Dashboard
The Kubernetes Dashboard provides a visual interface for debugging:
# Deploy the dashboard
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.7.0/aio/deploy/recommended.yaml
# Create a service account with access
kubectl create serviceaccount dashboard-admin
kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=default:dashboard-admin
# Get the token
kubectl describe secret $(kubectl get secret | grep dashboard-admin | awk '{print $1}')
# Start the proxy
kubectl proxy
Access the dashboard at: http://localhost:8001/api/v1/namespaces/kubernetes-dashboard/services/https:kubernetes-dashboard:/proxy/
Using ksniff for Packet Capture
ksniff allows you to capture network traffic from a pod:
# Install ksniff plugin
kubectl krew install sniff
# Capture traffic
kubectl sniff <pod-name> -n <namespace> -o capture.pcap
Debug Kubernetes Control Plane
For cluster-level issues, you may need to check control plane components:
# Check control plane pods (if running as pods)
kubectl get pods -n kube-system
# Check kube-apiserver logs (for Kubernetes installed with kubeadm)
kubectl logs -n kube-system kube-apiserver-<node-name>
Visualizing Kubernetes Debugging Flow
Here's a flowchart to help guide your debugging process:
Real-World Debugging Example
Let's walk through a complete debugging example:
Scenario: Web Application Not Accessible
- Check the pod status:
kubectl get pods -n web-app
Output:
NAME READY STATUS RESTARTS AGE
web-frontend-59c78f6b89 1/1 Running 0 15m
web-backend-75b9c7c7f5 1/1 Running 0 15m
- Check the service:
kubectl get svc -n web-app
Output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
web-frontend ClusterIP 10.100.200.123 <none> 80/TCP 15m
web-backend ClusterIP 10.100.200.124 <none> 8080/TCP 15m
- Check if service has endpoints:
kubectl get endpoints web-frontend -n web-app
Output:
NAME ENDPOINTS AGE
web-frontend <none> 15m
No endpoints! Let's check why:
- Check service selectors vs. pod labels:
kubectl describe svc web-frontend -n web-app | grep Selector
Output:
Selector: app=frontend,env=prod
kubectl get pod web-frontend-59c78f6b89 -n web-app --show-labels
Output:
NAME READY STATUS RESTARTS AGE LABELS
web-frontend-59c78f6b89 1/1 Running 0 20m app=frontend,env=staging
- Found the issue! The service is selecting pods with
env=prod
, but our pod hasenv=staging
. Fix by updating the service selector:
kubectl patch svc web-frontend -n web-app -p '{"spec":{"selector":{"app":"frontend","env":"staging"}}}'
- Verify endpoints again:
kubectl get endpoints web-frontend -n web-app
Output:
NAME ENDPOINTS AGE
web-frontend 10.244.2.15:80 22m
Problem solved! The service now has endpoints.
Best Practices for Effective Debugging
- Be methodical: Follow a systematic approach to eliminate potential issues one by one.
- Use labels effectively: Proper labeling makes it easier to identify and filter resources.
- Set appropriate resource requests and limits: This helps prevent resource-related issues.
- Implement health checks: Liveness and readiness probes help Kubernetes manage application health.
- Collect metrics: Use tools like Prometheus for monitoring and alerting.
- Create good logs: Design applications to log meaningful information at appropriate levels.
- Use namespaces: Organize resources in namespaces for better manageability.
Debugging Tools Ecosystem
Beyond built-in Kubernetes tools, consider these tools for enhanced debugging capabilities:
- k9s: Terminal-based UI for managing Kubernetes clusters
- Lens: Graphical IDE for Kubernetes
- Stern: Multi-pod log tailing
- Kube-ps1: Kubernetes prompt for bash
- kubectx/kubens: Tools for switching between contexts and namespaces
- Popeye: Scans cluster for misconfigurations
Summary
Kubernetes debugging requires understanding various components and their interactions. By mastering the tools and techniques covered in this guide, you'll be able to:
- Identify and resolve pod issues quickly
- Troubleshoot service and networking problems
- Diagnose and fix resource constraints
- Use advanced debugging tools when needed
Remember that effective debugging in Kubernetes is as much about being methodical as it is about knowing the right commands.
Exercises
- Create a deliberately broken deployment (e.g., with an invalid image) and practice debugging it.
- Set up a service with incorrect selectors and fix it using the debugging techniques covered.
- Create a pod with insufficient resource limits and debug the resulting issues.
- Practice capturing and analyzing logs from a multi-container pod.
- Set up network policies and debug connectivity issues between pods.
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)