Promtail Scrape Configs
Introduction
Promtail is a log collection agent designed to work with Grafana Loki. One of its most important components is the scrape_configs
section, which defines how Promtail discovers, processes, and forwards logs to Loki. This configuration is inspired by Prometheus's scrape configuration, making it familiar for users already working with Prometheus.
In this guide, we'll explore how to effectively configure Promtail's scrape_configs
to collect logs from various sources, add labels, and prepare them for efficient querying in Loki.
Basic Structure of scrape_configs
The scrape_configs
section is a list of configurations, where each configuration defines:
- What logs to collect (targets)
- How to label these logs
- How to process these logs before sending them to Loki
Here's the basic structure of a scrape_configs
entry:
scrape_configs:
- job_name: <job_name>
static_configs:
- targets:
- localhost
labels:
<label_name>: <label_value>
pipeline_stages:
- <stage_name>:
<stage_config>
Let's break down each component:
- job_name: A unique identifier for the scrape job
- static_configs: Defines static targets and their labels
- pipeline_stages: A series of transformations to apply to logs before sending them to Loki
Target Discovery
Promtail supports several service discovery mechanisms to find log sources. Let's explore the most common ones:
Static Targets
The simplest approach is to manually specify targets:
scrape_configs:
- job_name: static_logs
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*.log
In this example:
- We define a job named
static_logs
- We're targeting
localhost
- We assign the label
job: varlogs
to all logs - The special label
__path__
tells Promtail which files to read (all .log files in /var/log)
File Discovery
For environments with dynamic log files, file discovery is more appropriate:
scrape_configs:
- job_name: file_discovery
file_sd_configs:
- files:
- /etc/promtail/targets/*.yaml
refresh_interval: 5m
With this configuration, Promtail reads target definitions from all YAML files in /etc/promtail/targets/
. These files should contain target specifications similar to static_configs
.
Example target file (/etc/promtail/targets/app_logs.yaml
):
- targets:
- localhost
labels:
job: app_logs
__path__: /var/log/app/*.log
app: myapp
Kubernetes Discovery
Promtail excels at collecting logs from Kubernetes pods:
scrape_configs:
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_container_name]
target_label: container
- source_labels: [__meta_kubernetes_namespace]
target_label: namespace
- source_labels: [__meta_kubernetes_pod_name]
target_label: pod
- source_labels: [__meta_kubernetes_pod_node_name]
target_label: node
- replacement: /var/log/pods/*$1/*.log
separator: /
source_labels:
- __meta_kubernetes_pod_uid
- __meta_kubernetes_pod_container_name
target_label: __path__
This configuration:
- Discovers all Kubernetes pods
- Keeps only pods with the annotation
prometheus.io/scrape: "true"
- Extracts container, namespace, pod, and node information as labels
- Constructs the file path for each container's logs
Labeling Strategies
Labels are crucial for efficient log querying in Loki. Here are some best practices:
Static Labels
Add static labels to identify the source or type of logs:
scrape_configs:
- job_name: nginx_logs
static_configs:
- targets:
- localhost
labels:
job: nginx
environment: production
service: web
__path__: /var/log/nginx/*.log
Dynamic Labels with relabel_configs
Use relabel_configs
to dynamically generate labels:
scrape_configs:
- job_name: app_logs
static_configs:
- targets:
- localhost
labels:
__path__: /var/log/apps/*.log
relabel_configs:
- source_labels: [__path__]
regex: "/var/log/apps/(.*)\\.log"
target_label: app
replacement: $1
This extracts the application name from the log file path and adds it as a label.
Pipeline Stages
Pipeline stages process logs before sending them to Loki. Let's explore some common stages:
Extracting JSON Fields
For logs in JSON format:
scrape_configs:
- job_name: json_logs
static_configs:
- targets:
- localhost
labels:
job: json_app
__path__: /var/log/json_app/*.log
pipeline_stages:
- json:
expressions:
level: level
user: user.name
request_id: request.id
- labels:
level:
user:
request_id:
This configuration:
- Parses JSON logs
- Extracts fields:
level
,user.name
(nested field), andrequest.id
- Adds these fields as labels to facilitate querying
Regular Expression Extraction
For logs in custom formats:
scrape_configs:
- job_name: nginx_access_logs
static_configs:
- targets:
- localhost
labels:
job: nginx
__path__: /var/log/nginx/access.log
pipeline_stages:
- regex:
expression: '^(?P<ip>\\S+) - (?P<user>\\S+) \\[(?P<timestamp>\\S+) \\S+\\] "(?P<method>\\S+) (?P<path>\\S+) (?P<protocol>\\S+)" (?P<status>\\d+) (?P<size>\\d+) "(?P<referer>[^"]*)" "(?P<agent>[^"]*)"$'
- labels:
method:
status:
path:
This configuration:
- Uses regex to parse the standard NGINX access log format
- Extracts HTTP method, status code, and path
- Adds these as labels
Filtering Logs
Filter out noisy or unnecessary logs:
scrape_configs:
- job_name: app_logs
static_configs:
- targets:
- localhost
labels:
job: app
__path__: /var/log/app/*.log
pipeline_stages:
- json:
expressions:
level: level
- match:
selector: '{level="debug"}'
action: drop
This configuration drops all debug logs, reducing the volume sent to Loki.
Timestamp Processing
Ensure logs have the correct timestamp:
scrape_configs:
- job_name: timestamped_logs
static_configs:
- targets:
- localhost
labels:
job: app
__path__: /var/log/app/*.log
pipeline_stages:
- json:
expressions:
ts: timestamp
- timestamp:
source: ts
format: RFC3339
This extracts a timestamp field from the JSON and uses it as the log entry's timestamp.
Real-World Example: Multi-Component Application
Let's combine these concepts into a comprehensive example for a hypothetical microservice application:
scrape_configs:
# Frontend Nginx logs
- job_name: frontend
static_configs:
- targets:
- localhost
labels:
component: frontend
service: nginx
__path__: /var/log/nginx/*.log
pipeline_stages:
- regex:
expression: '^(?P<ip>\\S+) - (?P<user>\\S+) \\[(?P<timestamp>\\S+) \\S+\\] "(?P<method>\\S+) (?P<path>\\S+) (?P<protocol>\\S+)" (?P<status>\\d+) (?P<size>\\d+)'
- timestamp:
source: timestamp
format: '02/Jan/2006:15:04:05'
- labels:
method:
status:
path:
# Backend API logs (JSON format)
- job_name: backend_api
static_configs:
- targets:
- localhost
labels:
component: backend
service: api
__path__: /var/log/api/*.log
pipeline_stages:
- json:
expressions:
level: level
message: message
method: request.method
path: request.path
status: response.status
user_id: user.id
timestamp: time
- timestamp:
source: timestamp
format: RFC3339
- labels:
level:
method:
status:
user_id:
- match:
selector: '{level="debug"}'
action: drop
# Database logs
- job_name: database
static_configs:
- targets:
- localhost
labels:
component: database
service: postgres
__path__: /var/log/postgres/*.log
pipeline_stages:
- regex:
expression: '^(?P<timestamp>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}.\\d+) \\[(?P<pid>\\d+)\\] (?P<level>\\w+): (?P<message>.*)$'
- timestamp:
source: timestamp
format: '2006-01-02 15:04:05.000'
- labels:
level:
This configuration:
- Collects logs from three components: frontend (Nginx), backend API (JSON), and database (Postgres)
- Applies appropriate parsing for each format
- Extracts useful fields as labels
- Handles timestamps correctly for each source
- Filters out debug logs from the API service
Visualizing Log Flow
Best Practices
-
Label Cardinality: Be mindful of high-cardinality labels (e.g., user IDs, IP addresses). Too many unique label values can impact Loki's performance.
-
Structured Logging: Encourage structured logging in your applications (e.g., JSON) to make extraction easier.
-
Resource Considerations: Monitor Promtail's resource usage. Reading many large log files can consume significant memory and CPU.
-
Pipeline Efficiency: Order your pipeline stages efficiently - filtering early reduces processing work.
-
Test Configurations: Use Promtail's dry-run mode to test configurations before deployment:
bashpromtail --dry-run --config.file=promtail-config.yaml --client.url=http://loki:3100/loki/api/v1/push
Troubleshooting
If logs aren't appearing in Loki, check these common issues:
-
Path Patterns: Ensure your
__path__
patterns match the actual log files. -
Permissions: Verify Promtail has permission to read the log files.
-
Label Filters: Check if pipeline stages are inadvertently dropping logs.
-
Connection Issues: Verify Promtail can reach your Loki instance.
-
Log Timestamps: Ensure timestamps are correctly extracted and formatted.
Summary
Promtail's scrape_configs
provide a flexible and powerful way to collect, process, and forward logs to Grafana Loki. By understanding the key components - target discovery, labeling, and pipeline stages - you can build efficient log collection pipelines that make your logs accessible and queryable.
Remember these key points:
- Target discovery finds your log sources
- Labels make logs queryable in Loki
- Pipeline stages transform logs before ingestion
- Be mindful of resource usage and cardinality
Exercises
-
Configure Promtail to collect logs from a web server (Apache or Nginx) and extract useful fields like HTTP method, path, and status code.
-
Set up a pipeline that parses JSON logs, extracts fields, and filters out logs below a certain severity level.
-
Create a configuration for a multi-container Docker environment that adds container name, image, and service as labels.
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)