Custom Service Discovery in Prometheus
Introduction
Service discovery is one of Prometheus' most powerful features, allowing it to automatically find and monitor targets in dynamic environments. While Prometheus ships with many built-in service discovery mechanisms for popular platforms like Kubernetes, AWS, and Docker, there are scenarios where you might need to integrate with custom infrastructure or implement specialized discovery logic.
Custom service discovery enables you to extend Prometheus' capabilities by creating your own mechanisms to discover and monitor targets. This is particularly useful when:
- Working with proprietary or custom infrastructure not supported by built-in mechanisms
- Implementing unique discovery logic specific to your organization
- Integrating with legacy systems or custom service registries
- Creating specialized monitoring workflows for specific applications
In this guide, we'll explore how to implement custom service discovery in Prometheus, understand the available methods, and walk through practical examples.
Methods for Custom Service Discovery
Prometheus offers several approaches for implementing custom service discovery:
- File-based Service Discovery: The simplest approach, where target information is provided in files that Prometheus reads
- Custom HTTP Service Discovery: An HTTP endpoint that Prometheus queries for target information
- Custom Service Discovery Programs: External programs that generate target files for Prometheus
Let's explore each method in detail.
File-based Service Discovery
The simplest way to implement custom service discovery is through file-based SD. In this approach, you write a program that generates files containing target information in a format Prometheus can understand.
How it Works
- You create a program that discovers your services
- The program writes target information to files in JSON or YAML format
- Prometheus periodically reads these files to update its targets
Configuration
To configure file-based service discovery in Prometheus, use the file_sd_configs
section in your prometheus.yml
:
scrape_configs:
- job_name: 'custom-file-sd'
file_sd_configs:
- files:
- '/path/to/targets/*.json'
refresh_interval: 30s
File Format
The files should contain an array of target groups, each with its own set of labels:
[
{
"targets": ["host1:9100", "host2:9100"],
"labels": {
"env": "production",
"service": "api"
}
},
{
"targets": ["host3:9100"],
"labels": {
"env": "staging",
"service": "backend"
}
}
]
Example Implementation
Let's create a simple Python script that discovers services in a custom environment and writes them to a file for Prometheus:
#!/usr/bin/env python3
import json
import os
import time
import requests
OUTPUT_FILE = '/path/to/targets/custom_targets.json'
DISCOVERY_ENDPOINT = 'https://my-service-registry/api/services'
REFRESH_INTERVAL = 60 # seconds
def discover_services():
# In a real implementation, this would query your service registry
try:
response = requests.get(DISCOVERY_ENDPOINT)
services = response.json()
# Transform the services into Prometheus target format
target_groups = []
for service in services:
target_groups.append({
"targets": [f"{service['host']}:{service['port']}"],
"labels": {
"service": service['name'],
"env": service['environment'],
"team": service['team']
}
})
# Ensure directory exists
os.makedirs(os.path.dirname(OUTPUT_FILE), exist_ok=True)
# Write to temporary file and rename to ensure atomic updates
temp_file = OUTPUT_FILE + '.temp'
with open(temp_file, 'w') as f:
json.dump(target_groups, f)
os.rename(temp_file, OUTPUT_FILE)
print(f"Updated {OUTPUT_FILE} with {len(target_groups)} target groups")
except Exception as e:
print(f"Error discovering services: {e}")
def main():
while True:
discover_services()
time.sleep(REFRESH_INTERVAL)
if __name__ == "__main__":
main()
Run this script as a service to continuously update the targets file.
HTTP Service Discovery
Prometheus can also query an HTTP endpoint to discover targets. This approach requires you to implement an HTTP server that returns target information.
How it Works
- You implement an HTTP server with an endpoint that returns target information
- Prometheus periodically calls this endpoint to refresh its targets
- The endpoint returns target information in JSON format
Configuration
Configure HTTP service discovery using the http_sd_configs
section:
scrape_configs:
- job_name: 'custom-http-sd'
http_sd_configs:
- url: 'http://service-discovery:8080/targets'
refresh_interval: 30s
Response Format
The HTTP endpoint must return JSON in the same format as file-based SD:
[
{
"targets": ["app1.example.com:9090", "app2.example.com:9090"],
"labels": {
"env": "production",
"job": "api-service"
}
}
]
Example Implementation
Here's a simple Flask application that serves as a custom HTTP service discovery endpoint:
from flask import Flask, jsonify
import requests
app = Flask(__name__)
# In a real implementation, this would query your service registry
def get_services():
service_registry_url = "https://my-service-registry/api/services"
try:
response = requests.get(service_registry_url)
return response.json()
except Exception as e:
app.logger.error(f"Error fetching services: {e}")
return []
@app.route('/targets')
def targets():
services = get_services()
# Transform to Prometheus target format
target_groups = []
for service in services:
target_groups.append({
"targets": [f"{service['host']}:{service['port']}"],
"labels": {
"service": service['name'],
"env": service['environment']
}
})
return jsonify(target_groups)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8080)
Custom Service Discovery Programs
For more complex scenarios, Prometheus supports custom service discovery programs. These are external programs that Prometheus runs to discover targets.
How it Works
- Prometheus executes the specified program when it starts
- The program outputs target information to its standard output
- Prometheus reads this output to configure its targets
Configuration
Configure a custom SD program in your prometheus.yml
:
scrape_configs:
- job_name: 'custom-program-sd'
file_sd_configs:
- files:
- custom_sd_targets.json
relabel_configs:
- source_labels: [__meta_custom_environment]
target_label: env
Then set up a separate service that runs your custom SD program and writes to the specified file.
Example Implementation
Here's an example in Go that implements a custom service discovery program:
package main
import (
"encoding/json"
"log"
"os"
"time"
)
type Target struct {
Targets []string `json:"targets"`
Labels map[string]string `json:"labels"`
}
func discoverServices() ([]Target, error) {
// In a real implementation, this would discover your services
// For this example, we'll return static targets
targets := []Target{
{
Targets: []string{"service1:9090", "service2:9090"},
Labels: map[string]string{
"__meta_custom_environment": "production",
"__meta_custom_team": "backend",
},
},
{
Targets: []string{"service3:9090"},
Labels: map[string]string{
"__meta_custom_environment": "staging",
"__meta_custom_team": "data",
},
},
}
return targets, nil
}
func main() {
outputFile := "custom_sd_targets.json"
for {
targets, err := discoverServices()
if err != nil {
log.Printf("Error discovering services: %v", err)
time.Sleep(30 * time.Second)
continue
}
data, err := json.MarshalIndent(targets, "", " ")
if err != nil {
log.Printf("Error marshaling targets: %v", err)
time.Sleep(30 * time.Second)
continue
}
tempFile := outputFile + ".tmp"
err = os.WriteFile(tempFile, data, 0644)
if err != nil {
log.Printf("Error writing to temp file: %v", err)
time.Sleep(30 * time.Second)
continue
}
err = os.Rename(tempFile, outputFile)
if err != nil {
log.Printf("Error renaming temp file: %v", err)
time.Sleep(30 * time.Second)
continue
}
log.Printf("Updated %s with %d target groups", outputFile, len(targets))
time.Sleep(30 * time.Second)
}
}
Creating a Complete Custom Service Discovery Solution
Let's tie everything together with a more comprehensive example that discovers services from a custom API and provides them to Prometheus.
Architecture
Implementation
Our solution will include:
- A service registry with a REST API
- A custom SD adapter that queries the registry and formats targets for Prometheus
- Prometheus configuration to read these targets
Service Registry Mock
We'll simulate a service registry API with a simple HTTP server:
from flask import Flask, jsonify, request
import threading
import time
app = Flask(__name__)
# In-memory storage for services
services = []
@app.route('/services', methods=['GET'])
def get_services():
return jsonify(services)
@app.route('/services', methods=['POST'])
def register_service():
service = request.json
# Simple validation
if not all(k in service for k in ['name', 'host', 'port']):
return jsonify({"error": "Missing required fields"}), 400
# Remove existing service with same name if exists
global services
services = [s for s in services if s['name'] != service['name']]
services.append(service)
return jsonify({"status": "registered"}), 201
def cleanup_services():
"""Remove services that haven't sent heartbeats"""
while True:
time.sleep(60)
global services
current_time = time.time()
services = [s for s in services if current_time - s.get('last_seen', 0) < 120]
# Start cleanup thread
threading.Thread(target=cleanup_services, daemon=True).start()
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8081)
Custom Service Discovery Adapter
This program will query our service registry and format the results for Prometheus:
#!/usr/bin/env python3
import json
import os
import time
import requests
import logging
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger('prometheus-sd-adapter')
# Configuration
OUTPUT_DIR = '/etc/prometheus/sd'
OUTPUT_FILE = os.path.join(OUTPUT_DIR, 'custom_targets.json')
SERVICE_REGISTRY_URL = 'http://service-registry:8081/services'
REFRESH_INTERVAL = 30 # seconds
def discover_services():
"""Query the service registry and format services for Prometheus"""
try:
response = requests.get(SERVICE_REGISTRY_URL, timeout=10)
response.raise_for_status()
services = response.json()
logger.info(f"Discovered {len(services)} services from registry")
# Transform to Prometheus target format
target_groups = []
for service in services:
# Skip services that don't have required fields
if not all(k in service for k in ['name', 'host', 'port']):
logger.warning(f"Skipping service with missing fields: {service}")
continue
# Create target group
target = {
"targets": [f"{service['host']}:{service['port']}"],
"labels": {
"service": service['name']
}
}
# Add optional labels if present
for label in ['environment', 'team', 'version', 'datacenter']:
if label in service:
target['labels'][label] = service[label]
target_groups.append(target)
# Ensure output directory exists
os.makedirs(OUTPUT_DIR, exist_ok=True)
# Write to temporary file and rename for atomic update
temp_file = OUTPUT_FILE + '.temp'
with open(temp_file, 'w') as f:
json.dump(target_groups, f, indent=2)
os.rename(temp_file, OUTPUT_FILE)
logger.info(f"Updated {OUTPUT_FILE} with {len(target_groups)} target groups")
except requests.exceptions.RequestException as e:
logger.error(f"Error querying service registry: {e}")
except json.JSONDecodeError as e:
logger.error(f"Error parsing response from service registry: {e}")
except Exception as e:
logger.error(f"Unexpected error in service discovery: {e}")
def main():
"""Main loop to periodically discover services"""
logger.info("Starting Prometheus Service Discovery adapter")
while True:
discover_services()
time.sleep(REFRESH_INTERVAL)
if __name__ == "__main__":
main()
Prometheus Configuration
Configure Prometheus to use our custom service discovery:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'custom-services'
file_sd_configs:
- files:
- '/etc/prometheus/sd/custom_targets.json'
refresh_interval: 5s
relabel_configs:
- source_labels: [service]
target_label: job
Using Docker Compose
Here's a complete setup using Docker Compose:
version: '3'
services:
# Service registry
service-registry:
build: ./service-registry
ports:
- "8081:8081"
# Custom service discovery adapter
sd-adapter:
build: ./sd-adapter
volumes:
- prometheus_data:/etc/prometheus/sd
environment:
- SERVICE_REGISTRY_URL=http://service-registry:8081/services
- OUTPUT_FILE=/etc/prometheus/sd/custom_targets.json
- REFRESH_INTERVAL=30
depends_on:
- service-registry
# Prometheus server
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/etc/prometheus/sd
depends_on:
- sd-adapter
volumes:
prometheus_data:
Best Practices for Custom Service Discovery
When implementing custom service discovery for Prometheus, follow these best practices:
- Resilience: Make your SD mechanism resilient to failures, with proper error handling and retries
- Performance: Optimize for performance, especially in large environments
- Atomic Updates: Use atomic file operations (write to temp file, then rename) to prevent Prometheus from reading partial files
- Labeling: Add meaningful labels to targets for better querying and visualization
- Caching: Implement caching to reduce load on your service registry
- Logging: Include comprehensive logging for troubleshooting
- Security: Secure your service registry and discovery mechanism with proper authentication and authorization
Common Pitfalls and How to Avoid Them
When implementing custom service discovery, watch out for these common issues:
- Stale Targets: Implement proper cleanup of stale services
- High Cardinality: Avoid adding too many unique label values
- Too Frequent Updates: Balance refresh intervals to avoid overloading systems
- Missing Metadata: Ensure all necessary metadata is included with each target
- Inconsistent Naming: Use consistent naming conventions for services and labels
Summary
Custom service discovery extends Prometheus' capabilities to monitor dynamic environments beyond what's covered by built-in service discovery mechanisms. In this guide, we've explored:
- Different methods for implementing custom service discovery (file-based, HTTP, and custom programs)
- How to implement each method with practical examples
- A complete end-to-end solution that integrates with a custom service registry
- Best practices and common pitfalls to avoid
By creating your own service discovery mechanisms, you can adapt Prometheus to monitor any environment, regardless of its complexity or proprietary nature.
Additional Resources
- Prometheus Service Discovery Documentation
- File-Based Service Discovery
- HTTP-Based Service Discovery
Exercises
- Implement a simple file-based service discovery mechanism that discovers services running on your local machine
- Create an HTTP service discovery endpoint that integrates with your organization's service registry
- Extend the examples in this guide to add health checking to the discovered services
- Implement custom relabeling rules to transform service metadata into meaningful Prometheus labels
- Design a service discovery solution for a distributed microservice architecture spanning multiple data centers
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)