GCE Service Discovery
Introduction
When monitoring applications deployed on Google Cloud Platform (GCP), manually maintaining a list of targets to scrape can become unwieldy as your infrastructure scales. Prometheus offers a powerful solution with its Google Compute Engine (GCE) service discovery mechanism, which automatically discovers and monitors your GCE instances.
This service discovery integration allows Prometheus to:
- Automatically discover GCE instances in your projects
- Apply filtering based on metadata and other attributes
- Dynamically update monitoring targets as instances are created or terminated
- Extract labels from GCE metadata for better organization of metrics
In this guide, we'll explore how to configure and use GCE service discovery with Prometheus, providing practical examples to help you integrate this capability into your monitoring infrastructure.
Prerequisites
Before setting up GCE service discovery, you'll need:
- A Google Cloud Platform (GCP) account with access to Google Compute Engine
- Prometheus installed (either on GCE or with network access to your GCE instances)
- Proper IAM permissions to allow Prometheus to access GCE API
- Basic understanding of Prometheus configuration
How GCE Service Discovery Works
Prometheus uses the GCP API to discover instances across your GCE projects. Here's how the process works:
The discovery process involves:
- Prometheus making API calls to Google Cloud
- Retrieving metadata about running instances
- Creating target endpoints based on discovered instances
- Applying relabeling rules to organize and filter targets
- Scraping metrics from the discovered endpoints
Configuring GCE Service Discovery
Let's look at how to configure Prometheus to discover and scrape GCE instances:
Authentication Setup
First, Prometheus needs to authenticate with Google Cloud. There are several methods:
-
Using Google Application Default Credentials:
- When Prometheus runs on GCE, it can use the instance's service account
- Configure the service account with appropriate IAM roles (
compute.viewer
at minimum)
-
Using a Service Account Key File:
- Create a service account with the necessary permissions
- Download the key file and reference it in your Prometheus config
Basic Configuration
Add the following to your prometheus.yml
file to enable GCE service discovery:
scrape_configs:
- job_name: 'gce-instances'
gce_sd_configs:
- project: 'your-gcp-project-id'
zone: 'us-central1-a'
port: 9100 # Default port for node_exporter
This configuration will:
- Discover all GCE instances in the specified project and zone
- Assume each instance has a metrics endpoint on port 9100 (typical for node_exporter)
Multi-Zone and Multi-Project Configuration
To monitor instances across multiple zones or projects:
scrape_configs:
- job_name: 'gce-instances'
gce_sd_configs:
- project: 'project-1'
zones:
- 'us-central1-a'
- 'us-central1-b'
port: 9100
- project: 'project-2'
zones:
- 'us-west1-a'
port: 9100
Authentication with a Key File
If you're running Prometheus outside of GCP or need to use a specific service account:
scrape_configs:
- job_name: 'gce-instances'
gce_sd_configs:
- project: 'your-gcp-project-id'
zone: 'us-central1-a'
port: 9100
credentials_file: '/path/to/service-account-key.json'
Available Labels and Relabeling
GCE service discovery automatically attaches several metadata labels to discovered targets. These include:
__meta_gce_instance_id
: The numeric instance ID__meta_gce_instance_name
: The user-defined instance name__meta_gce_machine_type
: The machine type of the instance__meta_gce_metadata_NAME
: Each metadata item becomes a label__meta_gce_network
: The network of the instance__meta_gce_private_ip
: The private IP address__meta_gce_project
: The GCP project__meta_gce_public_ip
: The public IP address (if available)__meta_gce_tags
: Instance tags (comma-separated)__meta_gce_zone
: The zone of the instance
Using Relabeling to Filter Instances
You can use relabeling to filter which instances Prometheus monitors:
scrape_configs:
- job_name: 'gce-web-servers'
gce_sd_configs:
- project: 'your-gcp-project-id'
zone: 'us-central1-a'
port: 9100
relabel_configs:
- source_labels: [__meta_gce_tags]
regex: '.*,web-server,.*'
action: keep
This configuration will only keep targets that have the web-server
tag.
Creating Meaningful Labels from Metadata
You can transform GCE metadata into Prometheus labels:
scrape_configs:
- job_name: 'gce-instances'
gce_sd_configs:
- project: 'your-gcp-project-id'
zone: 'us-central1-a'
port: 9100
relabel_configs:
- source_labels: [__meta_gce_instance_name]
target_label: instance
- source_labels: [__meta_gce_metadata_environment]
target_label: environment
- source_labels: [__meta_gce_zone]
target_label: zone
This configuration extracts the instance name, a custom metadata field called "environment", and the zone into Prometheus labels.
Practical Example: Complete Configuration
Here's a more comprehensive example that:
- Discovers instances across multiple zones
- Filters based on an "app" metadata property
- Applies custom labeling
- Adjusts the scrape interval for these targets
scrape_configs:
- job_name: 'gce-app-monitoring'
scrape_interval: 30s
gce_sd_configs:
- project: 'production-apps'
zones:
- 'us-central1-a'
- 'us-central1-b'
- 'us-central1-c'
port: 9100
relabel_configs:
# Keep only instances with app=prometheus-monitored metadata
- source_labels: [__meta_gce_metadata_app]
regex: 'prometheus-monitored'
action: keep
# Create an app_component label from a metadata field
- source_labels: [__meta_gce_metadata_component]
target_label: app_component
# Create an instance label from instance name
- source_labels: [__meta_gce_instance_name]
target_label: instance
# Create a zone label
- source_labels: [__meta_gce_zone]
target_label: zone
# If the instance has a custom metrics port defined in metadata, use it
- source_labels: [__meta_gce_metadata_metrics_port]
regex: (.+)
target_label: __address__
replacement: ${1}:${__meta_gce_metadata_metrics_port}
Best Practices
When using GCE service discovery with Prometheus, consider the following best practices:
-
Use Metadata for Configuration:
- Add metadata to your instances to control monitoring behavior
- Example: Add
prometheus_port: 9100
as metadata to specify custom ports
-
Create Instance Groups by Purpose:
- Group similar instances with common tags or metadata
- Makes filtering and relabeling more consistent
-
Consider API Quotas:
- GCP has API request quotas
- Set appropriate scrape intervals to avoid excessive API calls
-
Use GCE Private IPs When Possible:
- More secure and avoids public internet traffic
- Use VPC peering or similar if Prometheus is in a different network
-
Include Zones in Labels:
- Allows you to analyze metrics by zone
- Helpful for identifying zone-specific issues
Troubleshooting GCE Service Discovery
If you're experiencing issues with GCE service discovery, check the following:
1. Authentication Problems
If Prometheus can't discover GCE instances, check:
- IAM permissions on the service account
- Validity of the credentials file
- API access to compute.googleapis.com
2. No Targets Found
If no targets are being discovered:
- Verify the project ID and zone are correct
- Check if your relabeling rules might be filtering out all targets
- Examine Prometheus logs for API errors
3. Service Discovery Works But No Metrics
If instances are discovered but no metrics are collected:
- Verify the port configuration
- Check if firewalls allow traffic to the metrics port
- Ensure the monitoring agent is running on the instances
Example: Setting Up a Complete Monitoring Stack
Let's look at a complete example of setting up a monitoring stack on GCE with auto-discovery:
- Create a VM Template with node_exporter:
# Install node_exporter on your instances
wget https://github.com/prometheus/node_exporter/releases/download/v1.3.1/node_exporter-1.3.1.linux-amd64.tar.gz
tar xvfz node_exporter-1.3.1.linux-amd64.tar.gz
sudo cp node_exporter-1.3.1.linux-amd64/node_exporter /usr/local/bin/
sudo useradd -rs /bin/false node_exporter
# Create a systemd service
sudo tee /etc/systemd/system/node_exporter.service > /dev/null <<EOF
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
EOF
# Start and enable the service
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
- Add Custom Metadata to Instances:
Add these metadata items to your instances:
prometheus-monitored: true
app: web-server
(or whatever describes your application)environment: production
(or staging, development, etc.)
- Configure Prometheus:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'gce-node-exporters'
gce_sd_configs:
- project: 'your-gcp-project-id'
zones:
- 'us-central1-a'
- 'us-central1-b'
port: 9100
relabel_configs:
- source_labels: [__meta_gce_metadata_prometheus_monitored]
regex: 'true'
action: keep
- source_labels: [__meta_gce_metadata_app]
target_label: app
- source_labels: [__meta_gce_metadata_environment]
target_label: environment
- source_labels: [__meta_gce_instance_name]
target_label: instance
- Create Prometheus Alerts Based on Labels:
groups:
- name: GCE Instance Alerts
rules:
- alert: HighCPUUsage
expr: (1 - avg by(instance, app, environment) (irate(node_cpu_seconds_total{mode="idle"}[5m]))) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% for 5 minutes on {{ $labels.instance }} (app: {{ $labels.app }}, environment: {{ $labels.environment }})"
Summary
GCE service discovery is a powerful feature of Prometheus that enables automatic discovery and monitoring of your Google Cloud infrastructure. By leveraging this capability, you can:
- Eliminate manual configuration of target endpoints
- Automatically adapt your monitoring as instances are created or terminated
- Apply sophisticated filtering based on metadata, tags, and other attributes
- Extract meaningful labels from GCE metadata
- Create a more maintainable and scalable monitoring system
As your Google Cloud infrastructure grows, GCE service discovery becomes increasingly valuable, allowing your monitoring to scale automatically with your environment.
Additional Resources
To deepen your understanding of GCE service discovery with Prometheus, consider exploring these resources:
- Official Prometheus Documentation on GCE SD
- Google Cloud Monitoring Best Practices
- Setting up IAM for Prometheus
Exercises
To reinforce your learning, try these practical exercises:
- Set up Prometheus with GCE service discovery to monitor your own GCE instances
- Create a relabeling configuration that categorizes instances by both zone and machine type
- Develop a Grafana dashboard that visualizes metrics grouped by the discovered GCE metadata labels
- Configure different scrape intervals for different types of instances based on metadata
- Implement a multi-project discovery setup and compare resource usage across projects
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)