Kubernetes Grafana
Introduction
Monitoring is a critical aspect of maintaining healthy Kubernetes clusters. In the complex and distributed world of container orchestration, knowing what's happening inside your cluster can mean the difference between smooth operations and chaotic troubleshooting sessions. This is where Grafana comes into play.
Grafana is an open-source visualization and analytics platform that integrates seamlessly with Kubernetes. It allows you to query, visualize, alert on, and understand your metrics no matter where they are stored. By connecting Grafana to your Kubernetes metrics, you can create comprehensive dashboards that provide real-time insights into your cluster's health and performance.
In this guide, we'll explore how to implement Grafana in a Kubernetes environment, from installation to creating meaningful dashboards, and examine real-world applications that make it an essential tool in the Kubernetes ecosystem.
Prerequisites
Before we dive in, make sure you have:
- A running Kubernetes cluster
kubectl
configured to communicate with your cluster- Basic understanding of Kubernetes concepts
- Helm (optional, but recommended for easier installation)
Installing Grafana on Kubernetes
Using Helm (Recommended Method)
Helm is a package manager for Kubernetes that simplifies the installation process of applications. Let's use it to install Grafana:
# Add the Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
# Update Helm repositories
helm repo update
# Install Grafana
helm install grafana grafana/grafana
After running these commands, you'll see output providing information about your Grafana installation, including how to access it.
Using YAML Manifests
If you prefer not to use Helm, you can create a deployment using YAML manifests:
# grafana-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
labels:
app: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:latest
ports:
- containerPort: 3000
name: http-grafana
env:
- name: GF_SECURITY_ADMIN_USER
value: admin
- name: GF_SECURITY_ADMIN_PASSWORD
value: admin
readinessProbe:
httpGet:
path: /api/health
port: 3000
Then, create a service to expose Grafana:
# grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
selector:
app: grafana
type: LoadBalancer
ports:
- port: 3000
targetPort: 3000
Apply these manifests to your cluster:
kubectl apply -f grafana-deployment.yaml
kubectl apply -f grafana-service.yaml
Accessing Grafana
After installation, you need to access the Grafana UI. The method depends on how you deployed it:
# For Helm installations, get the admin password
kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
# Forward the Grafana port to your local machine
kubectl port-forward service/grafana 3000:3000
Now, open your browser and navigate to http://localhost:3000
. Log in with username "admin" and the password you retrieved.
Connecting Prometheus as a Data Source
Grafana needs data sources to visualize metrics. Prometheus is commonly used with Kubernetes:
- In the Grafana UI, navigate to "Configuration" > "Data Sources"
- Click "Add data source"
- Select "Prometheus"
- Set the URL to your Prometheus server (typically
http://prometheus-server:80
if using the Prometheus Helm chart) - Click "Save & Test"
If the connection is successful, you'll see a green "Data source is working" message.
Creating Your First Kubernetes Dashboard
Let's create a basic dashboard to monitor some essential Kubernetes metrics:
- In Grafana, click on "+" and select "Dashboard"
- Click "Add new panel"
- In the query editor, select your Prometheus data source
- Enter the following PromQL query to see CPU usage by pod:
sum(rate(container_cpu_usage_seconds_total{image!="", container_name!="POD"}[5m])) by (pod)
- Set a title like "Pod CPU Usage"
- Click "Apply"
Add another panel for memory usage:
- Click "Add panel"
- Enter this PromQL query:
sum(container_memory_working_set_bytes{image!="", container_name!="POD"}) by (pod)
- Set "Bytes binary" in the "Unit" field under the "Standard options" section
- Title it "Pod Memory Usage"
- Click "Apply"
- Click "Save" on the dashboard and give it a name like "Kubernetes Basics"
Understanding Grafana Dashboard Components
A Grafana dashboard consists of several components:
- Panels: Individual visualization units (graphs, stats, tables, etc.)
- Rows: Organizational structures to group panels
- Variables: Dynamic elements that allow changing dashboard scope
- Annotations: Event markers overlaid on graphs
- Time Range Controls: Allow selecting the time window for all panels
Here's a simple diagram showing how these components relate:
Advanced Grafana Features for Kubernetes
Using Dashboard Templates
Instead of creating dashboards from scratch, you can import pre-made templates:
- In Grafana, go to "+" and select "Import"
- Enter dashboard ID
6417
(a popular Kubernetes dashboard) - Select your Prometheus data source
- Click "Import"
You now have a comprehensive Kubernetes monitoring dashboard!
Setting Up Alerts
Let's set up a simple alert for high CPU usage:
- Edit your "Pod CPU Usage" panel
- Click the "Alert" tab
- Click "Create Alert"
- Configure:
- Name: "High CPU Usage"
- Evaluate every: "1m"
- For: "5m" (This means the condition must be true for 5 minutes before alerting)
- Conditions: "WHEN last() OF query(A, 5m, now) IS ABOVE 0.8"
- This will alert when any pod uses more than 80% CPU for 5 minutes
- Add a notification message: "Pod
{{ $labels.pod }}
in namespace{{ $labels.namespace }}
has high CPU usage:{{ $value }}%
" - Click "Save" to apply the alert
Creating Variables for Dynamic Dashboards
Variables make your dashboards dynamic and reusable:
- At the dashboard settings, click "Variables"
- Click "New"
- Configure:
- Name: "namespace"
- Label: "Namespace"
- Type: "Query"
- Data source: Your Prometheus
- Query:
label_values(kube_pod_info, namespace)
- Select "Multi-value" and "Include All option"
- Click "Add"
Add another variable for pods:
- Click "New"
- Configure:
- Name: "pod"
- Label: "Pod"
- Type: "Query"
- Data source: Your Prometheus
- Query:
label_values(kube_pod_info{namespace=~"$namespace"}, pod)
- Select "Multi-value" and "Include All option"
- Click "Add"
Now you can filter your dashboard by namespace and pod using dropdown menus at the top.
Real-World Grafana Use Cases in Kubernetes
Monitoring Application Performance
Create a dashboard for your specific applications that shows:
- Request rate
- Error rate
- Latency percentiles
- Resource usage
For a web application, you might use queries like:
sum(rate(http_requests_total{app="my-app"}[5m])) by (route)
Detecting and Troubleshooting Issues
A practical example of using Grafana for troubleshooting:
- User reports application slowness
- Check your Grafana dashboard
- Notice high memory usage and frequent garbage collection
- Correlate with a recent deployment
- Roll back or fix the memory leak
Capacity Planning
Use Grafana to visualize resource trends over time:
avg_over_time(sum(container_memory_usage_bytes{namespace="production"}) by (namespace)[7d:1h])
This can help you determine when to scale your cluster.
Best Practices for Kubernetes Monitoring with Grafana
-
Focus on the Four Golden Signals:
- Latency
- Traffic
- Errors
- Saturation
-
Use Meaningful Naming Conventions for dashboards and panels
-
Layer Your Dashboards from high-level overviews to detailed drill-downs
-
Set Up Proper Alerting to avoid alert fatigue:
- Alert on symptoms, not causes
- Make alert thresholds meaningful
- Include runbooks with alerts
-
Regularly Review and Update your dashboards as your applications evolve
Summary
Grafana is a powerful tool for visualizing and monitoring your Kubernetes clusters. By following this guide, you've learned how to:
- Install Grafana on Kubernetes
- Connect Prometheus as a data source
- Create basic and advanced dashboards
- Set up alerts for proactive monitoring
- Use variables for dynamic dashboards
- Apply Grafana to real-world monitoring scenarios
With these skills, you can build comprehensive monitoring solutions that help maintain the health and performance of your Kubernetes clusters.
Further Resources
- Official Grafana Documentation
- Prometheus Query Language (PromQL) Reference
- Grafana Community Dashboards
Exercises
- Create a dashboard showing the top 5 pods by CPU and memory usage
- Set up a variable that filters metrics by deployment
- Create an alert that notifies when pod restarts exceed a threshold
- Build a dashboard that shows correlations between application latency and resource usage
- Import the "Kubernetes Cluster" dashboard (ID: 6417) and customize it for your needs
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)