Configuration Validation
Introduction
One of the most common sources of problems with Prometheus is misconfiguration. Even small syntax errors or logical mistakes in your configuration files can prevent Prometheus from starting up properly or cause it to behave unexpectedly. In this guide, we'll explore how to validate your Prometheus configurations, identify common errors, and use built-in tools to ensure your monitoring setup is correct.
Configuration validation is an essential skill for anyone working with Prometheus. It helps you:
- Identify syntax errors before they cause downtime
- Verify that your scrape configurations will work as expected
- Ensure your alerting rules are properly formatted
- Catch logical errors in your recording rules
Let's dive into how you can validate your Prometheus configurations effectively.
Basic Configuration Validation
Prometheus provides a built-in command to check the syntax of your configuration files without actually starting the service. This is incredibly useful for catching basic errors before deploying your changes.
Using promtool
promtool
is a utility that comes bundled with Prometheus and provides several helpful functions, including configuration validation.
To check your main Prometheus configuration file:
promtool check config prometheus.yml
If your configuration is valid, you'll see output similar to:
checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file
If there are errors, promtool
will provide specific information about what's wrong:
checking prometheus.yml
FAILED: error parsing prometheus.yml: yaml: line 42: did not find expected key
Validating Rules
Similarly, you can validate your alerting and recording rules:
promtool check rules rules.yml
A successful validation will show:
Checking rules.yml
SUCCESS: 2 rules found
While an error might look like:
Checking rules.yml
FAILED: error parsing rules.yml: yaml: line 15: did not find expected key
Common Configuration Errors
Let's explore some of the most common configuration errors and how to identify and fix them.
YAML Syntax Errors
YAML is sensitive to indentation and formatting. Common YAML errors include:
- Incorrect indentation
- Missing colons after keys
- Using tabs instead of spaces
- Unquoted strings containing special characters
Example of incorrect YAML:
scrape_configs:
- job_name: node
static_configs
- targets: ['localhost:9100']
Corrected version:
scrape_configs:
- job_name: node
static_configs:
- targets: ['localhost:9100']
Notice the missing colon after static_configs
in the incorrect version.
Invalid Relabeling Configurations
Relabeling is a powerful feature but is also a common source of errors.
Example of incorrect relabeling:
relabel_configs:
- source_labels: [__address__]
regex: '(.*):(.*)'
replacement: '${1}'
target_label: instance
action: unknown_action
Corrected version:
relabel_configs:
- source_labels: [__address__]
regex: '(.*):(.*)'
replacement: '${1}'
target_label: instance
action: replace
The action
value must be one of the supported actions (replace
, keep
, drop
, etc.).
Prometheus Rule Syntax Errors
Alert and recording rules must follow specific syntax requirements.
Example of incorrect rule:
groups:
- name: example
rules:
- alert: HighErrorRate
expr: job:request_errors:rate5m / job:requests:rate5m > 0.1
for: 10m
lables:
severity: warning
Corrected version:
groups:
- name: example
rules:
- alert: HighErrorRate
expr: job:request_errors:rate5m / job:requests:rate5m > 0.1
for: 10m
labels:
severity: warning
Notice the typo in lables
(should be labels
).
Advanced Configuration Validation
Beyond basic syntax checking, there are more sophisticated ways to validate your configuration.
Dry Run Mode
You can start Prometheus in a "dry run" mode where it loads the configuration but exits immediately without starting all components:
prometheus --config.file=prometheus.yml --enable-feature=promql-at-modifier,promql-negative-offset --web.listen-address=0.0.0.0:9090
Watch the output for any warnings or errors. This will catch more subtle issues that the basic config check might miss.
Validation of PromQL Expressions
Prometheus doesn't automatically validate the correctness of your PromQL expressions in rules. You can manually test these against your running Prometheus server:
promtool query instant http://localhost:9090 'job:request_errors:rate5m / job:requests:rate5m > 0.1'
This helps ensure that your alert and recording rule expressions are valid.
Visual Configuration Validation
Let's visualize a typical configuration validation workflow:
Configuration Validation Best Practices
Here are some best practices to ensure your Prometheus configurations remain valid:
-
Use Version Control: Keep your configurations in a version control system like Git.
-
Implement CI/CD Validation: Set up automated validation in your CI/CD pipeline:
# Example GitLab CI configuration
validate_config:
stage: test
script:
- promtool check config prometheus.yml
- promtool check rules rules/*.yml
only:
changes:
- prometheus.yml
- rules/*.yml
-
Incremental Changes: Make small, incremental changes instead of large rewrites.
-
Documentation: Comment your configurations, especially complex relabeling or alerting rules.
-
Testing Environment: Test changes in a non-production environment first.
Troubleshooting Configuration Errors
When you encounter configuration errors, follow these steps to troubleshoot:
-
Read the Error Message: Prometheus usually provides specific line numbers and descriptions.
-
Isolate Changes: If you've made multiple changes, try to apply them one at a time.
-
Simplify: Temporarily simplify complex configurations to isolate the problem.
-
Check Logs: Examine Prometheus logs for additional context:
journalctl -u prometheus -f
- Consult Documentation: The official Prometheus documentation is comprehensive and regularly updated.
Example: Complete Configuration Validation Workflow
Let's walk through a complete example of validating a Prometheus configuration:
- Start with a configuration file:
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "rules/*.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node_exporter'
static_configs:
- targets ['node1:9100', 'node2:9100']
- Run basic syntax check:
$ promtool check config prometheus.yml
checking prometheus.yml
FAILED: error parsing prometheus.yml: yaml: line 15: did not find expected key
- Fix the error (missing colon after
targets
):
- job_name: 'node_exporter'
static_configs:
- targets: ['node1:9100', 'node2:9100']
- Run check again:
$ promtool check config prometheus.yml
checking prometheus.yml
SUCCESS: prometheus.yml is valid prometheus config file
- Try a dry run:
$ prometheus --config.file=prometheus.yml --web.enable-lifecycle
level=info ts=2023-05-10T12:00:00.000Z caller=main.go:213 msg="Starting Prometheus" version="(version=2.44.0, branch=HEAD, revision=adc41a87b8a559e49903e92cef911be77656a392)"
...
level=info ts=2023-05-10T12:00:01.000Z caller=main.go:1177 msg="Server is ready to receive web requests."
Summary
Configuration validation is a critical skill for successfully managing Prometheus. By using the built-in tools like promtool
, understanding common errors, and following best practices, you can ensure your monitoring system remains reliable and effective.
Remember these key points:
- Always validate configuration files before deployment
- Understand the common sources of errors
- Use both basic syntax checking and more advanced validation techniques
- Implement configuration validation in your deployment workflow
- Build incremental changes and test thoroughly
Additional Resources
Exercises
-
Create a basic Prometheus configuration file and introduce some deliberate errors. Use
promtool
to identify and fix them. -
Write a small shell script that validates all your Prometheus configuration files and rule files in a single command.
-
Set up a simple CI/CD pipeline that validates your Prometheus configuration before deployment.
-
Create a complex relabeling configuration and validate it using both
promtool
and a dry run. -
Experiment with different alert rule expressions and validate them using
promtool query instant
.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)