Prometheus Data Source

Introduction

Prometheus has become the de facto standard for metrics collection in cloud-native environments. When combined with Grafana's powerful visualization capabilities, it creates a robust monitoring solution for your applications and infrastructure.

In this guide, we'll explore how to set up Prometheus as a data source in Grafana, how to query metrics using PromQL (Prometheus Query Language), and how to build effective dashboards to visualize your data.

What is Prometheus?

Prometheus is an open-source monitoring and alerting toolkit designed for reliability and scalability. Key features include:

A multi-dimensional data model with time series data identified by metric name and key-value pairs
A flexible query language (PromQL) to leverage this dimensionality
Time series collection happens via a pull model over HTTP
Targets can be discovered via service discovery or static configuration
Multiple modes of graphing and dashboarding support

Configuring Prometheus as a Data Source

Prerequisites

Before you begin, ensure you have:

A running Grafana instance (v7.0 or later recommended)
A running Prometheus server with metrics

Adding Prometheus Data Source

Log in to your Grafana instance
Navigate to Configuration > Data Sources
Click "Add data source"
Select "Prometheus" from the list

Here's how to configure the basic settings:

Name: Prometheus
Default: Toggle on if this is your primary data source
URL: http://your-prometheus-server:9090
Access: Server (default)

Advanced Configuration Options

For more complex setups, you can configure:

Scrape interval: Set this to match your Prometheus scrape interval (usually 15s)
Query timeout: Maximum time for query execution (60s by default)
HTTP Method: GET or POST (POST recommended for complex queries)
Authentication: Basic auth, TLS client authentication, or custom headers

Querying Prometheus in Grafana

Understanding PromQL Basics

PromQL is the query language used to retrieve and manipulate time series data from Prometheus. Here are the fundamental concepts:

Instant vectors: A set of time series containing a single sample for each time series, all sharing the same timestamp
Range vectors: A set of time series containing a range of data points over time
Scalar: A simple numeric floating point value
String: A simple string value (currently unused)

Simple Query Examples

Let's start with basic examples:

Return all time series with the metric name http_requests_total:

http_requests_total

Return all time series with the metric name http_requests_total and the job label set to api-server:

http_requests_total{job="api-server"}

Return the rate of HTTP requests per second over the last 5 minutes:

rate(http_requests_total[5m])

Advanced Query Examples

Let's explore some more advanced queries:

Get the top 5 application instances by memory usage:

topk(5, sum(container_memory_usage_bytes) by (instance))

Calculate the 95th percentile request latency across all services:

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Calculate the percentage of HTTP errors (status code >= 500):

sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

Creating Effective Dashboards

Basic Dashboard Creation

To create a new dashboard:

Click "Create" > "Dashboard" in the Grafana sidebar
Click "Add new panel"
Select "Prometheus" as the data source
Enter your PromQL query in the query editor
Configure visualization options (Graph, Gauge, Table, etc.)
Click "Save" to add the panel to your dashboard

Visualization Types for Prometheus Data

Here are recommended visualization types for different metrics:

Time Series: Ideal for trends over time (CPU usage, request rates)
Gauge: Perfect for single-value metrics (current memory usage percentage)
Stat: Good for current values with thresholds (error rates, health statuses)
Bar Gauge: Useful for comparing multiple instances (disk usage across servers)
Heatmap: Excellent for distribution metrics (request duration histogram)

Template Variables

Template variables make your dashboards dynamic and reusable. Here's how to create them:

Go to Dashboard settings > Variables > Add variable
Set Name: instance
Type: Query
Data source: Prometheus
Query: label_values(up, instance)
Enable "Multi-value" and "Include All option"

Now you can use this in your queries:

http_requests_total{instance=~"$instance"}

This allows users to select different instances from a dropdown in your dashboard.

Real-World Examples

Monitoring a Web Application

Let's create a comprehensive dashboard for monitoring a web application:

For this dashboard, we'd create panels with these queries:

CPU Usage:

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[1m])) * 100)

Memory Usage:

node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes

Request Rate:

sum(rate(http_requests_total[5m])) by (route)

Error Rate:

sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

Latency (95th percentile):

histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le))

Infrastructure Monitoring

For Kubernetes infrastructure monitoring, you might create these panels:

Node CPU Usage:

sum(rate(container_cpu_usage_seconds_total{container!="POD",container!=""}[5m])) by (node)

Pod Memory Usage by Namespace:

sum(container_memory_usage_bytes{container!="POD",container!=""}) by (namespace)

Available Pods per Deployment:

kube_deployment_status_replicas_available

Best Practices

Effective Query Patterns

Use rate() for counters: Always wrap counters with rate() or increase() functions.

rate(http_requests_total[5m])  # Good
http_requests_total  # Bad - raw counter

Be specific with labels: Target exactly what you need.

http_requests_total{status="500"}  # Good
http_requests_total  # May be too broad

Be careful with expensive operations: Operations like topk() and bottomk() can be resource-intensive.

Alerting Based on Prometheus Metrics

Grafana allows you to set up alerts based on your Prometheus metrics:

Edit a panel and navigate to the "Alert" tab
Define a condition based on your query
Set evaluation intervals and notification channels

Example alert rule for high error rate:

Condition: WHEN last() OF query(A) > 5
Evaluation: Every 1m, For 5m
This triggers when the error rate exceeds 5% for 5 consecutive minutes

Dashboard Organization

Group related panels together
Use row collapsing for logical sections
Include documentation panels with markdown
Use consistent naming conventions
Set appropriate refresh intervals (not too frequent)

Troubleshooting Common Issues

Query Returns No Data

Possible causes and solutions:

Metric doesn't exist: Verify in Prometheus UI directly
Label mismatch: Check exact label names and values
Time range issue: Adjust the time range in Grafana

High Cardinality Problems

High cardinality (too many unique time series) can cause performance issues:

Avoid using high cardinality labels in grouping
Use topk() or bottomk() to limit returned series
Consider using recording rules in Prometheus for complex queries

Connectivity Issues

If Grafana can't connect to Prometheus:

Verify the Prometheus URL is correct
Check network connectivity between Grafana and Prometheus
Ensure any required authentication is configured
Review Grafana server logs for specific errors

Summary

Prometheus is a powerful data source for Grafana that enables comprehensive monitoring of your applications and infrastructure. By understanding PromQL and leveraging Grafana's visualization capabilities, you can create insightful dashboards that help maintain the health and performance of your systems.

Key takeaways:

Prometheus excels at storing and querying time-series metrics
PromQL provides a flexible way to manipulate and analyze your data
Grafana offers diverse visualization options tailored to different metric types
Template variables create dynamic, reusable dashboards
Following best practices ensures efficient and effective monitoring

Additional Resources

Exercises

Basic Setup: Install Prometheus and Grafana locally, then add Prometheus as a data source.
Query Practice: Create queries to monitor:
- System load average over time
- Memory usage percentage
- Disk space utilization
Dashboard Challenge: Create a dashboard with at least 5 panels showing different aspects of system performance.
Advanced PromQL: Write a query that shows the ratio of errors to total requests for each service in your application.
Alert Creation: Create an alert that triggers when any instance's CPU usage exceeds 80% for more than 5 minutes.

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is Prometheus?​

Configuring Prometheus as a Data Source​

Prerequisites​

Adding Prometheus Data Source​

Advanced Configuration Options​

Querying Prometheus in Grafana​

Understanding PromQL Basics​

Simple Query Examples​

Advanced Query Examples​

Creating Effective Dashboards​

Basic Dashboard Creation​

Visualization Types for Prometheus Data​

Template Variables​

Real-World Examples​

Monitoring a Web Application​

Infrastructure Monitoring​

Best Practices​

Effective Query Patterns​

Alerting Based on Prometheus Metrics​

Dashboard Organization​

Troubleshooting Common Issues​

Query Returns No Data​

High Cardinality Problems​

Connectivity Issues​

Summary​

Additional Resources​

Exercises​