Grafana-Managed Alerts
Introduction
Grafana-Managed Alerts are a powerful feature of Grafana Alerting that allows you to create, configure, and manage alert rules directly within the Grafana interface. Unlike traditional Prometheus alerting, which requires external configuration, Grafana-Managed Alerts provide an integrated experience for monitoring your systems and applications.
With Grafana-Managed Alerts, you can:
- Create alert rules based on any data source
- Define complex alert conditions with multiple queries
- Configure notification channels and message templates
- Organize alerts into logical groups
- Visualize alert states in dashboards
This guide will walk you through understanding and implementing Grafana-Managed Alerts in your monitoring environment.
Understanding Grafana-Managed Alerts Architecture
Grafana-Managed Alerts are part of the unified Grafana Alerting system. Before diving into creating alerts, it's important to understand the key components:
- Alert Rules: Definitions of conditions that determine when an alert should fire
- Alert Instances: Individual evaluations of an alert rule against specific labels
- Contact Points: Destinations for notifications (email, Slack, etc.)
- Notification Policies: Rules that determine how, when, and where notifications are sent
- Silences: Configurations to suppress notifications for specific alerts
- Alert Groups: Logical groupings of related alerts
Creating Your First Grafana-Managed Alert
Let's create a simple alert that monitors CPU usage and triggers when it exceeds 80%.
Step 1: Navigate to the Alerting section
In the Grafana sidebar, click on Alerting. This will take you to the Grafana Alerting UI.
Step 2: Create a new alert rule
Click on Alert Rules and then + New alert rule. You'll be presented with the alert rule creation form.
Step 3: Configure the alert rule
-
Rule name: Give your alert a descriptive name, like "High CPU Usage"
-
Rule type: Select "Grafana managed alert"
-
Folder: Choose an existing folder or create a new one to organize your alerts
-
Configure query and alert condition:
sqlSELECT mean("usage_idle") FROM "cpu" WHERE $timeFilter GROUP BY time($__interval) fill(null)
-
Set threshold: Define when the alert should fire. For our CPU example:
- Condition:
when last() of A is below 20
- This translates to CPU idle time below 20%, meaning usage above 80%
- Condition:
-
Configure alert evaluation behavior:
- Evaluate every:
1m
(evaluate the rule every minute) - For:
5m
(alert only if condition is true for 5 consecutive minutes)
- Evaluate every:
-
Add labels to help with organization:
- Severity:
warning
- Category:
system
- Severity:
-
Add annotations to provide additional context:
- Summary:
High CPU usage detected
- Description:
CPU usage has exceeded 80% for more than 5 minutes
- Summary:
Step 4: Save the alert rule
Click Save to create your alert rule. Grafana will start evaluating it based on your configuration.
Advanced Alert Configuration
For more complex monitoring scenarios, Grafana-Managed Alerts offer advanced configuration options.
Multi-Dimensional Alerts
You can create alerts that operate across multiple dimensions by using template variables and label matching.
For example, to monitor CPU usage across all hosts in a cluster:
SELECT mean("usage_idle") FROM "cpu" WHERE "host" =~ /^$hostname$/ AND $timeFilter GROUP BY time($__interval), "host" fill(null)
This will create separate alert instances for each host that matches the condition.
Using Math and Expressions
Grafana allows you to combine metrics and apply transformations using math expressions:
$A + $B < $C * 1.5
For example, to alert when free disk space is less than 10% of total capacity:
- Query A:
SELECT last("free") FROM "disk"
- Query B:
SELECT last("total") FROM "disk"
- Expression:
$A / $B * 100 < 10
Alert Rule with Multiple Conditions
You can create more sophisticated alerts by combining multiple conditions:
// Alert if both CPU usage is high AND memory usage is high
($A < 20) && ($B < 15)
Where:
- Query A monitors CPU idle percentage
- Query B monitors available memory percentage
Alert Notifications and Routing
After creating alert rules, you need to configure how and where notifications are sent.
Creating Contact Points
Contact Points define where notifications are sent. To create a new contact point:
- Go to Alerting > Contact points
- Click + Add contact point
- Select the integration type (Slack, Email, etc.)
- Configure the necessary information
For a Slack notification:
{
"recipient": "#alerts",
"title": "{{ .CommonLabels.alertname }}",
"message": "{{ .CommonAnnotations.summary }}
{{ .CommonAnnotations.description }}",
"url": "{{ .ExternalURL }}"
}
Configuring Notification Policies
Notification policies determine which alerts go to which contact points:
- Go to Alerting > Notification policies
- Configure the root policy or add nested policies
- Use label matchers to route different alerts
For example, to route critical alerts to a different channel:
- name: Critical Alerts
match:
severity: critical
contact_point: on-call-team
group_by: ['alertname']
repeat_interval: 10m
Organizing and Managing Alerts
As your alert rules grow, organization becomes crucial for maintenance.
Using Alert Groups
Group related alerts together for better organization:
groups:
- name: System Health
folder: Infrastructure
rules:
- name: High CPU Usage
# rule configuration...
- name: Low Disk Space
# rule configuration...
Implementing Alert Silence
To temporarily suppress notifications for maintenance or known issues:
- Go to Alerting > Silences
- Click New Silence
- Configure the matcher to target specific alerts
- Set a duration for the silence
matchers:
- name: alertname
value: High CPU Usage
isRegex: false
comment: "System maintenance window"
startsAt: "2023-09-15T14:00:00Z"
endsAt: "2023-09-15T18:00:00Z"
Real-World Example: Full Stack Monitoring
Let's create a comprehensive monitoring setup for a web application with frontend, backend, and database components.
1. Database Availability Alert
SELECT last("uptime_seconds") FROM "database_stats" WHERE $timeFilter GROUP BY time($__interval)
Alert when: last() of A is below 1
Labels:
- severity: critical
- component: database
2. API Latency Alert
SELECT mean("response_time") FROM "api_metrics" WHERE $timeFilter GROUP BY time($__interval)
Alert when: last() of A is above 500
Labels:
- severity: warning
- component: backend
3. Error Rate Alert
SELECT sum("error_count") / sum("request_count") * 100 FROM "application_metrics" WHERE $timeFilter GROUP BY time($__interval)
Alert when: last() of A is above 5
Labels:
- severity: warning
- component: application
4. Notification Policy Configuration
# Root policy
route:
receiver: default-email
group_by: ['alertname', 'component']
repeat_interval: 4h
routes:
# Critical database issues
- match:
severity: critical
component: database
receiver: database-team-pager
repeat_interval: 10m
# All other alerts
- match_re:
component: backend|application
receiver: application-team-slack
group_by: ['alertname', 'instance']
Troubleshooting Grafana-Managed Alerts
When working with Grafana-Managed Alerts, you might encounter some common issues:
No Data Alerts
If you're not receiving any data for your alert queries:
- Verify data source connectivity
- Check time range settings
- Validate query syntax
- Ensure metrics are being collected
False Positives/Negatives
To reduce alert noise:
- Adjust the "For" duration to prevent short spikes from triggering alerts
- Use percentile-based thresholds instead of averages
- Implement multi-condition checks
- Consider using the "No Data" and "Error" handling options appropriately
Alert History and Debugging
To investigate alert history:
- Go to Alerting > Alert rules
- Click on the specific rule
- View the "State history" tab
- Examine evaluation details and timing
Best Practices for Grafana-Managed Alerts
When implementing alerts, follow these best practices:
- Be selective: Only alert on actionable conditions that require human intervention
- Use meaningful names: Clear, descriptive names help during incidents
- Include context: Add detailed annotations to help responders understand the issue
- Group related alerts: Prevent alert storms by grouping related notifications
- Set appropriate thresholds: Base thresholds on historical data, not guesses
- Implement runbooks: Link to troubleshooting guides in alert annotations
- Review regularly: Periodically review and refine alert rules
Summary
Grafana-Managed Alerts provide a powerful, integrated approach to monitoring and alerting within the Grafana platform. By leveraging this feature, you can:
- Create sophisticated alert rules based on any data source
- Configure targeted notifications based on alert severity and type
- Organize and manage alerts effectively
- Build a comprehensive monitoring solution
With the knowledge from this guide, you're now equipped to implement effective alerting strategies for your applications and infrastructure using Grafana's native capabilities.
Additional Resources
For further learning and reference:
- Grafana Alerting Documentation
- Practice creating different types of alerts for various metrics
- Experiment with different notification channels and routing configurations
- Try implementing alert dashboards to visualize alert states
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)