Retention Policies

Introduction

When monitoring systems with Prometheus, you'll quickly encounter an important consideration: how long should your time series data be stored? Prometheus collects metrics continuously, which can accumulate into significant volumes of data over time. Retention policies define how Prometheus manages this data, determining what to keep and for how long before it gets discarded.

Understanding retention policies is crucial for any Prometheus deployment because they help you balance between:

Having sufficient historical data for analysis and troubleshooting
Managing storage resources efficiently
Optimizing query performance on your monitoring data

In this guide, we'll explore how Prometheus handles data retention, configuring retention settings, and best practices for managing your time series data effectively.

Default Retention Behavior

By default, Prometheus retains time series data for 15 days. This means that without any specific configuration, metrics older than 15 days will be automatically removed.

Let's understand what this means in practice:

If you deployed Prometheus on January 1st
By January 16th, your metrics from January 1st would be deleted
Only the most recent 15 days of data would be available for querying

This default behavior is reasonable for many basic monitoring setups but may need adjustment based on your specific requirements.

Configuring Retention Settings

Prometheus offers two primary ways to configure data retention:

Time-based retention - Keep data for a specified time period
Storage-based retention - Limit the total storage used by Prometheus

Let's look at how to configure these options:

Time-Based Retention

To modify the default 15-day retention period, use the --storage.tsdb.retention.time flag when starting Prometheus:

bash
# Retain data for 30 days
prometheus --storage.tsdb.retention.time=30d

The value can be specified in:

Hours (h)
Days (d)
Weeks (w)
Years (y)

For example:

bash
prometheus --storage.tsdb.retention.time=24h    # 24 hours
prometheus --storage.tsdb.retention.time=7d     # 7 days
prometheus --storage.tsdb.retention.time=4w     # 4 weeks
prometheus --storage.tsdb.retention.time=1y     # 1 year

Storage-Based Retention

Instead of configuring by time, you can limit retention by the total size of the data:

bash
# Limit storage to 100 GiB
prometheus --storage.tsdb.retention.size=100GB

The value can be specified in:

Bytes (B)
Kilobytes (KB)
Megabytes (MB)
Gigabytes (GB)
Terabytes (TB)
Pebibytes (PB)

For example:

bash
prometheus --storage.tsdb.retention.size=500MB  # 500 Megabytes
prometheus --storage.tsdb.retention.size=5GB    # 5 Gigabytes
prometheus --storage.tsdb.retention.size=1TB    # 1 Terabyte

Using Both Time and Size Limits

You can specify both time and size limits simultaneously:

bash
prometheus --storage.tsdb.retention.time=30d --storage.tsdb.retention.size=10GB

In this configuration, data will be deleted when it either:

Becomes older than 30 days, OR
The total storage exceeds 10GB

Prometheus will apply whichever condition is met first.

Configuration in prometheus.yml

While command line flags are common for configuring retention, you can also set these parameters in your prometheus.yml configuration file using the storage section:

yaml
storage:
  tsdb:
    path: /path/to/data
    retention:
      time: 45d
      size: 5GB

This approach is often preferred as it keeps all configuration in one place.

Understanding How Prometheus Stores Data

To better manage retention, it helps to understand how Prometheus stores data:

Prometheus uses a time series database (TSDB) for storage
Data is organized into blocks (typically 2-hour chunks)
Periodically, these blocks are compacted into larger blocks
Old blocks are deleted based on retention policy

This block-based approach allows Prometheus to efficiently manage the lifecycle of time series data.

Monitoring Your Retention Policy

You can monitor the effectiveness of your retention policy using Prometheus's own metrics:

promql
# Total storage size (bytes)
prometheus_tsdb_storage_blocks_bytes

# Oldest timestamp stored
prometheus_tsdb_lowest_timestamp_seconds

# Newest timestamp stored
prometheus_tsdb_head_max_time_seconds

These metrics help you validate that your retention policy is working as expected.

Practical Examples

Let's explore some common scenarios and how to configure retention for them:

Example 1: Development Environment

For a development environment where historical data is less critical:

bash
prometheus --storage.tsdb.retention.time=7d --storage.tsdb.retention.size=2GB

This configuration:

Keeps data for one week
Prevents storage from exceeding 2GB
Is suitable for testing and development purposes

Example 2: Production Monitoring

For a production system where longer-term trends are important:

yaml
# In prometheus.yml
storage:
  tsdb:
    path: /prometheus-data
    retention:
      time: 90d
      size: 50GB

This configuration:

Retains data for 90 days (about 3 months)
Limits total storage to 50GB
Allows for quarterly trend analysis
Provides sufficient history for troubleshooting

Example 3: High-Cardinality Environment

For environments with high-cardinality metrics (many unique time series):

bash
prometheus --storage.tsdb.retention.time=15d --storage.tsdb.retention.size=100GB

This configuration:

Uses the default 15-day retention
Allocates a larger storage limit to accommodate many unique series
Helps prevent premature data deletion due to space constraints

Best Practices

When setting up your retention policies, consider the following best practices:

Start conservative - Begin with smaller retention periods and increase as needed
Monitor your storage growth - Track how quickly your TSDB is growing
Consider query patterns - Align retention with how far back users typically query
Account for seasonality - Retain enough data to cover weekly, monthly, or seasonal patterns
Use remote storage for long-term storage if needed
Test impact on query performance - Longer retention may slow down queries

Long-term Storage Solutions

If you need to retain data longer than what's practical for Prometheus itself, consider these approaches:

Prometheus Federation - Use a separate Prometheus instance with different retention settings
Remote Write - Configure Prometheus to send data to long-term storage solutions
Thanos or Cortex - Use these projects to extend Prometheus with long-term storage capabilities

For example, a basic remote write configuration:

yaml
# In prometheus.yml
remote_write:
  - url: "http://remote-storage-adapter:9201/write"

Summary

Prometheus retention policies allow you to control how long your monitoring data is stored and how much disk space it consumes. By properly configuring retention settings, you can balance your need for historical data against resource constraints.

Key takeaways:

Default retention is 15 days
Configure time-based retention with --storage.tsdb.retention.time
Configure storage-based retention with --storage.tsdb.retention.size
You can use both time and size limits together
Consider your specific use cases when setting retention policies
For very long-term storage, consider external solutions

Additional Resources

Here are some exercises to reinforce your understanding:

Calculate the average daily storage growth of your Prometheus instance
Experiment with different retention settings and observe the impact on query performance
Set up a basic federation to see how it can be used for longer-term storage
Practice creating PromQL queries to monitor your Prometheus storage metrics

For further information, you might explore:

Time Series Database concepts
Remote storage integrations for Prometheus
Thanos and Cortex for scalable, long-term storage
Data compression and aggregation techniques

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Default Retention Behavior​

Configuring Retention Settings​

Time-Based Retention​

Storage-Based Retention​

Using Both Time and Size Limits​

Configuration in prometheus.yml​

Understanding How Prometheus Stores Data​

Monitoring Your Retention Policy​

Practical Examples​

Example 1: Development Environment​

Example 2: Production Monitoring​

Example 3: High-Cardinality Environment​

Best Practices​

Long-term Storage Solutions​

Summary​

Additional Resources​

Introduction

Default Retention Behavior

Configuring Retention Settings

Time-Based Retention

Storage-Based Retention

Using Both Time and Size Limits

Configuration in prometheus.yml

Understanding How Prometheus Stores Data

Monitoring Your Retention Policy

Practical Examples

Example 1: Development Environment

Example 2: Production Monitoring

Example 3: High-Cardinality Environment

Best Practices

Long-term Storage Solutions

Summary

Additional Resources