Retention Policies
Introduction
When monitoring systems with Prometheus, you'll quickly encounter an important consideration: how long should your time series data be stored? Prometheus collects metrics continuously, which can accumulate into significant volumes of data over time. Retention policies define how Prometheus manages this data, determining what to keep and for how long before it gets discarded.
Understanding retention policies is crucial for any Prometheus deployment because they help you balance between:
- Having sufficient historical data for analysis and troubleshooting
- Managing storage resources efficiently
- Optimizing query performance on your monitoring data
In this guide, we'll explore how Prometheus handles data retention, configuring retention settings, and best practices for managing your time series data effectively.
Default Retention Behavior
By default, Prometheus retains time series data for 15 days. This means that without any specific configuration, metrics older than 15 days will be automatically removed.
Let's understand what this means in practice:
- If you deployed Prometheus on January 1st
- By January 16th, your metrics from January 1st would be deleted
- Only the most recent 15 days of data would be available for querying
This default behavior is reasonable for many basic monitoring setups but may need adjustment based on your specific requirements.
Configuring Retention Settings
Prometheus offers two primary ways to configure data retention:
- Time-based retention - Keep data for a specified time period
- Storage-based retention - Limit the total storage used by Prometheus
Let's look at how to configure these options:
Time-Based Retention
To modify the default 15-day retention period, use the --storage.tsdb.retention.time
flag when starting Prometheus:
# Retain data for 30 days
prometheus --storage.tsdb.retention.time=30d
The value can be specified in:
- Hours (h)
- Days (d)
- Weeks (w)
- Years (y)
For example:
prometheus --storage.tsdb.retention.time=24h # 24 hours
prometheus --storage.tsdb.retention.time=7d # 7 days
prometheus --storage.tsdb.retention.time=4w # 4 weeks
prometheus --storage.tsdb.retention.time=1y # 1 year
Storage-Based Retention
Instead of configuring by time, you can limit retention by the total size of the data:
# Limit storage to 100 GiB
prometheus --storage.tsdb.retention.size=100GB
The value can be specified in:
- Bytes (B)
- Kilobytes (KB)
- Megabytes (MB)
- Gigabytes (GB)
- Terabytes (TB)
- Pebibytes (PB)
For example:
prometheus --storage.tsdb.retention.size=500MB # 500 Megabytes
prometheus --storage.tsdb.retention.size=5GB # 5 Gigabytes
prometheus --storage.tsdb.retention.size=1TB # 1 Terabyte
Using Both Time and Size Limits
You can specify both time and size limits simultaneously:
prometheus --storage.tsdb.retention.time=30d --storage.tsdb.retention.size=10GB
In this configuration, data will be deleted when it either:
- Becomes older than 30 days, OR
- The total storage exceeds 10GB
Prometheus will apply whichever condition is met first.
Configuration in prometheus.yml
While command line flags are common for configuring retention, you can also set these parameters in your prometheus.yml
configuration file using the storage
section:
storage:
tsdb:
path: /path/to/data
retention:
time: 45d
size: 5GB
This approach is often preferred as it keeps all configuration in one place.
Understanding How Prometheus Stores Data
To better manage retention, it helps to understand how Prometheus stores data:
- Prometheus uses a time series database (TSDB) for storage
- Data is organized into blocks (typically 2-hour chunks)
- Periodically, these blocks are compacted into larger blocks
- Old blocks are deleted based on retention policy
This block-based approach allows Prometheus to efficiently manage the lifecycle of time series data.
Monitoring Your Retention Policy
You can monitor the effectiveness of your retention policy using Prometheus's own metrics:
# Total storage size (bytes)
prometheus_tsdb_storage_blocks_bytes
# Oldest timestamp stored
prometheus_tsdb_lowest_timestamp_seconds
# Newest timestamp stored
prometheus_tsdb_head_max_time_seconds
These metrics help you validate that your retention policy is working as expected.
Practical Examples
Let's explore some common scenarios and how to configure retention for them:
Example 1: Development Environment
For a development environment where historical data is less critical:
prometheus --storage.tsdb.retention.time=7d --storage.tsdb.retention.size=2GB
This configuration:
- Keeps data for one week
- Prevents storage from exceeding 2GB
- Is suitable for testing and development purposes
Example 2: Production Monitoring
For a production system where longer-term trends are important:
# In prometheus.yml
storage:
tsdb:
path: /prometheus-data
retention:
time: 90d
size: 50GB
This configuration:
- Retains data for 90 days (about 3 months)
- Limits total storage to 50GB
- Allows for quarterly trend analysis
- Provides sufficient history for troubleshooting
Example 3: High-Cardinality Environment
For environments with high-cardinality metrics (many unique time series):
prometheus --storage.tsdb.retention.time=15d --storage.tsdb.retention.size=100GB
This configuration:
- Uses the default 15-day retention
- Allocates a larger storage limit to accommodate many unique series
- Helps prevent premature data deletion due to space constraints
Best Practices
When setting up your retention policies, consider the following best practices:
- Start conservative - Begin with smaller retention periods and increase as needed
- Monitor your storage growth - Track how quickly your TSDB is growing
- Consider query patterns - Align retention with how far back users typically query
- Account for seasonality - Retain enough data to cover weekly, monthly, or seasonal patterns
- Use remote storage for long-term storage if needed
- Test impact on query performance - Longer retention may slow down queries
Long-term Storage Solutions
If you need to retain data longer than what's practical for Prometheus itself, consider these approaches:
- Prometheus Federation - Use a separate Prometheus instance with different retention settings
- Remote Write - Configure Prometheus to send data to long-term storage solutions
- Thanos or Cortex - Use these projects to extend Prometheus with long-term storage capabilities
For example, a basic remote write configuration:
# In prometheus.yml
remote_write:
- url: "http://remote-storage-adapter:9201/write"
Summary
Prometheus retention policies allow you to control how long your monitoring data is stored and how much disk space it consumes. By properly configuring retention settings, you can balance your need for historical data against resource constraints.
Key takeaways:
- Default retention is 15 days
- Configure time-based retention with
--storage.tsdb.retention.time
- Configure storage-based retention with
--storage.tsdb.retention.size
- You can use both time and size limits together
- Consider your specific use cases when setting retention policies
- For very long-term storage, consider external solutions
Additional Resources
Here are some exercises to reinforce your understanding:
- Calculate the average daily storage growth of your Prometheus instance
- Experiment with different retention settings and observe the impact on query performance
- Set up a basic federation to see how it can be used for longer-term storage
- Practice creating PromQL queries to monitor your Prometheus storage metrics
For further information, you might explore:
- Time Series Database concepts
- Remote storage integrations for Prometheus
- Thanos and Cortex for scalable, long-term storage
- Data compression and aggregation techniques
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)