Grafana-Managed Alerts

Introduction

Grafana-Managed Alerts are a powerful feature of Grafana Alerting that allows you to create, configure, and manage alert rules directly within the Grafana interface. Unlike traditional Prometheus alerting, which requires external configuration, Grafana-Managed Alerts provide an integrated experience for monitoring your systems and applications.

With Grafana-Managed Alerts, you can:

Create alert rules based on any data source
Define complex alert conditions with multiple queries
Configure notification channels and message templates
Organize alerts into logical groups
Visualize alert states in dashboards

This guide will walk you through understanding and implementing Grafana-Managed Alerts in your monitoring environment.

Understanding Grafana-Managed Alerts Architecture

Grafana-Managed Alerts are part of the unified Grafana Alerting system. Before diving into creating alerts, it's important to understand the key components:

Alert Rules: Definitions of conditions that determine when an alert should fire
Alert Instances: Individual evaluations of an alert rule against specific labels
Contact Points: Destinations for notifications (email, Slack, etc.)
Notification Policies: Rules that determine how, when, and where notifications are sent
Silences: Configurations to suppress notifications for specific alerts
Alert Groups: Logical groupings of related alerts

Creating Your First Grafana-Managed Alert

Let's create a simple alert that monitors CPU usage and triggers when it exceeds 80%.

Step 1: Navigate to the Alerting section

In the Grafana sidebar, click on Alerting. This will take you to the Grafana Alerting UI.

Step 2: Create a new alert rule

Click on Alert Rules and then + New alert rule. You'll be presented with the alert rule creation form.

Step 3: Configure the alert rule

Rule name: Give your alert a descriptive name, like "High CPU Usage"
Rule type: Select "Grafana managed alert"
Folder: Choose an existing folder or create a new one to organize your alerts

Configure query and alert condition:

SELECT mean("usage_idle") FROM "cpu" WHERE $timeFilter GROUP BY time($__interval) fill(null)

Set threshold: Define when the alert should fire. For our CPU example:
- Condition: when last() of A is below 20
- This translates to CPU idle time below 20%, meaning usage above 80%
Configure alert evaluation behavior:
- Evaluate every: 1m (evaluate the rule every minute)
- For: 5m (alert only if condition is true for 5 consecutive minutes)
Add labels to help with organization:
- Severity: warning
- Category: system
Add annotations to provide additional context:
- Summary: High CPU usage detected
- Description: CPU usage has exceeded 80% for more than 5 minutes

Step 4: Save the alert rule

Click Save to create your alert rule. Grafana will start evaluating it based on your configuration.

Advanced Alert Configuration

For more complex monitoring scenarios, Grafana-Managed Alerts offer advanced configuration options.

Multi-Dimensional Alerts

You can create alerts that operate across multiple dimensions by using template variables and label matching.

For example, to monitor CPU usage across all hosts in a cluster:

SELECT mean("usage_idle") FROM "cpu" WHERE "host" =~ /^$hostname$/ AND $timeFilter GROUP BY time($__interval), "host" fill(null)

This will create separate alert instances for each host that matches the condition.

Using Math and Expressions

Grafana allows you to combine metrics and apply transformations using math expressions:

$A + $B < $C * 1.5

For example, to alert when free disk space is less than 10% of total capacity:

Query A: SELECT last("free") FROM "disk"
Query B: SELECT last("total") FROM "disk"
Expression: $A / $B * 100 < 10

Alert Rule with Multiple Conditions

You can create more sophisticated alerts by combining multiple conditions:

// Alert if both CPU usage is high AND memory usage is high
($A < 20) && ($B < 15)

Where:

Query A monitors CPU idle percentage
Query B monitors available memory percentage

Alert Notifications and Routing

After creating alert rules, you need to configure how and where notifications are sent.

Creating Contact Points

Contact Points define where notifications are sent. To create a new contact point:

Go to Alerting > Contact points
Click + Add contact point
Select the integration type (Slack, Email, etc.)
Configure the necessary information

For a Slack notification:

{
  "recipient": "#alerts",
  "title": "{{ .CommonLabels.alertname }}",
  "message": "{{ .CommonAnnotations.summary }}
{{ .CommonAnnotations.description }}",
  "url": "{{ .ExternalURL }}"
}

Configuring Notification Policies

Notification policies determine which alerts go to which contact points:

Go to Alerting > Notification policies
Configure the root policy or add nested policies
Use label matchers to route different alerts

For example, to route critical alerts to a different channel:

- name: Critical Alerts
  match:
    severity: critical
  contact_point: on-call-team
  group_by: ['alertname']
  repeat_interval: 10m

Organizing and Managing Alerts

As your alert rules grow, organization becomes crucial for maintenance.

Using Alert Groups

Group related alerts together for better organization:

groups:
  - name: System Health
    folder: Infrastructure
    rules:
      - name: High CPU Usage
        # rule configuration...
      - name: Low Disk Space
        # rule configuration...

Implementing Alert Silence

To temporarily suppress notifications for maintenance or known issues:

Go to Alerting > Silences
Click New Silence
Configure the matcher to target specific alerts
Set a duration for the silence

matchers:
  - name: alertname
    value: High CPU Usage
    isRegex: false
comment: "System maintenance window"
startsAt: "2023-09-15T14:00:00Z"
endsAt: "2023-09-15T18:00:00Z"

Real-World Example: Full Stack Monitoring

Let's create a comprehensive monitoring setup for a web application with frontend, backend, and database components.

1. Database Availability Alert

SELECT last("uptime_seconds") FROM "database_stats" WHERE $timeFilter GROUP BY time($__interval)

Alert when: last() of A is below 1 Labels:

severity: critical
component: database

2. API Latency Alert

SELECT mean("response_time") FROM "api_metrics" WHERE $timeFilter GROUP BY time($__interval)

Alert when: last() of A is above 500 Labels:

severity: warning
component: backend

3. Error Rate Alert

SELECT sum("error_count") / sum("request_count") * 100 FROM "application_metrics" WHERE $timeFilter GROUP BY time($__interval)

Alert when: last() of A is above 5 Labels:

severity: warning
component: application

4. Notification Policy Configuration

# Root policy
route:
  receiver: default-email
  group_by: ['alertname', 'component']
  repeat_interval: 4h
  routes:
    # Critical database issues
    - match:
        severity: critical
        component: database
      receiver: database-team-pager
      repeat_interval: 10m
    # All other alerts
    - match_re:
        component: backend|application
      receiver: application-team-slack
      group_by: ['alertname', 'instance']

Troubleshooting Grafana-Managed Alerts

When working with Grafana-Managed Alerts, you might encounter some common issues:

No Data Alerts

If you're not receiving any data for your alert queries:

Verify data source connectivity
Check time range settings
Validate query syntax
Ensure metrics are being collected

False Positives/Negatives

To reduce alert noise:

Adjust the "For" duration to prevent short spikes from triggering alerts
Use percentile-based thresholds instead of averages
Implement multi-condition checks
Consider using the "No Data" and "Error" handling options appropriately

Alert History and Debugging

To investigate alert history:

Go to Alerting > Alert rules
Click on the specific rule
View the "State history" tab
Examine evaluation details and timing

Best Practices for Grafana-Managed Alerts

When implementing alerts, follow these best practices:

Be selective: Only alert on actionable conditions that require human intervention
Use meaningful names: Clear, descriptive names help during incidents
Include context: Add detailed annotations to help responders understand the issue
Group related alerts: Prevent alert storms by grouping related notifications
Set appropriate thresholds: Base thresholds on historical data, not guesses
Implement runbooks: Link to troubleshooting guides in alert annotations
Review regularly: Periodically review and refine alert rules

Summary

Grafana-Managed Alerts provide a powerful, integrated approach to monitoring and alerting within the Grafana platform. By leveraging this feature, you can:

Create sophisticated alert rules based on any data source
Configure targeted notifications based on alert severity and type
Organize and manage alerts effectively
Build a comprehensive monitoring solution

With the knowledge from this guide, you're now equipped to implement effective alerting strategies for your applications and infrastructure using Grafana's native capabilities.

Additional Resources

For further learning and reference:

Grafana Alerting Documentation
Practice creating different types of alerts for various metrics
Experiment with different notification channels and routing configurations
Try implementing alert dashboards to visualize alert states

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Understanding Grafana-Managed Alerts Architecture​

Creating Your First Grafana-Managed Alert​

Step 1: Navigate to the Alerting section​

Step 2: Create a new alert rule​

Step 3: Configure the alert rule​

Step 4: Save the alert rule​

Advanced Alert Configuration​

Multi-Dimensional Alerts​

Using Math and Expressions​

Alert Rule with Multiple Conditions​

Alert Notifications and Routing​

Creating Contact Points​

Configuring Notification Policies​

Organizing and Managing Alerts​

Using Alert Groups​

Implementing Alert Silence​

Real-World Example: Full Stack Monitoring​

1. Database Availability Alert​

2. API Latency Alert​

3. Error Rate Alert​

4. Notification Policy Configuration​

Troubleshooting Grafana-Managed Alerts​

No Data Alerts​

False Positives/Negatives​

Alert History and Debugging​

Best Practices for Grafana-Managed Alerts​

Summary​

Additional Resources​