Time Series Databases

Introduction

Time Series Databases (TSDBs) are specialized database systems designed to store, retrieve, and process time-stamped data efficiently. Unlike traditional databases, which excel at capturing the current state of data, time series databases are optimized for tracking how data changes over time.

In today's data-driven world, we're constantly generating time-based data:

IoT sensors recording temperature every minute
Stock prices changing throughout the trading day
Website traffic metrics collected hourly
Health monitoring devices tracking vital signs continuously

Traditional databases can struggle with the volume and velocity of time-stamped data. This is where Time Series Databases come in — they're built from the ground up to handle this specific pattern of data storage and retrieval.

What Makes a Time Series Database Different?

Time series data has unique characteristics that require specialized handling:

Time-stamped records: Each data point is associated with a specific point in time
Append-only: New data is typically added at the "now" end of the timeline
Rarely updated: Once recorded, historical data rarely changes
Time-based queries: Common operations include aggregating data by time periods
High write throughput: Many applications record data at high frequency

Let's visualize the core structure of time series data:

Key Features of Time Series Databases

Time-Optimized Data Model

Time series databases organize data in a time-structured way, making it efficient to:

Insert data points with timestamps
Query data within specific time ranges
Aggregate data across time intervals (hourly, daily, monthly)

Data Compression

Because time series data often follows patterns, TSDBs use specialized compression techniques:

// Without compression, time series data might look like:
[
  { timestamp: "2023-01-01 00:00:00", value: 75.2 },
  { timestamp: "2023-01-01 00:01:00", value: 75.3 },
  { timestamp: "2023-01-01 00:02:00", value: 75.1 },
  { timestamp: "2023-01-01 00:03:00", value: 75.2 },
  // ... thousands more entries
]

// With compression, the database might store this more efficiently
// using delta encoding for timestamps and value compression

Downsampling and Retention Policies

TSDBs often include built-in capabilities to:

Downsample: Automatically aggregate older data (e.g., keeping minute-level data for a week, then converting to hourly averages)
Auto-expire: Remove data older than a specified retention period

Time-Based Querying

Time series databases offer powerful functions for time-based analysis:

Aggregation by time intervals
Moving averages
Rate calculations
Interpolation for missing values

Popular Time Series Databases

InfluxDB

InfluxDB is one of the most popular purpose-built time series databases.

Key concepts in InfluxDB:

Measurements: Similar to tables in relational databases
Tags: Indexed metadata for efficient filtering
Fields: The actual data values being stored
Points: Individual data entries with timestamp, measurement, tag set, and field set

Here's how to write data to InfluxDB using its line protocol:

# Format: <measurement>[,<tag_key>=<tag_value>] <field_key>=<field_value> [<timestamp>]
cpu,host=server01,region=us-west cpu_usage=80.3 1641024000000000000

And here's a basic query using InfluxQL:

SELECT mean("cpu_usage") FROM "cpu" 
WHERE "host" = 'server01' AND time >= now() - 1h 
GROUP BY time(5m)

TimescaleDB

TimescaleDB extends PostgreSQL with time-series capabilities, offering the best of both worlds: time-series optimization and the full power of a relational database.

Creating a time-series table in TimescaleDB:

-- First create a regular PostgreSQL table
CREATE TABLE sensor_data (
  time TIMESTAMPTZ NOT NULL,
  sensor_id INTEGER,
  temperature DOUBLE PRECISION,
  humidity DOUBLE PRECISION
);

-- Then convert it to a TimescaleDB hypertable
SELECT create_hypertable('sensor_data', 'time');

Querying data with standard SQL:

-- Get hourly average temperatures for a specific sensor over the past day
SELECT 
  time_bucket('1 hour', time) AS hour,
  AVG(temperature) AS avg_temp
FROM sensor_data
WHERE 
  sensor_id = 12 AND
  time > NOW() - INTERVAL '1 day'
GROUP BY hour
ORDER BY hour;

Prometheus

Prometheus is a monitoring-focused time series database with a built-in scraping and alerting system.

A simple Prometheus query using its PromQL language:

# Get the average CPU usage across all instances for the past hour
avg(rate(node_cpu_seconds_total{mode="user"}[5m])) by (instance)

Real-World Applications

IoT Sensor Monitoring

Imagine you're monitoring temperature sensors across multiple locations in a manufacturing plant:

// Sample sensor data structure
{
  "timestamp": "2023-05-15T14:23:00Z",
  "sensor_id": "temp-sensor-123",
  "location": "warehouse-a",
  "temperature": 22.4,
  "humidity": 45.2
}

With a time series database, you can:

Store millions of readings efficiently
Query average temperatures by location and time period
Set up alerts for anomalies
Visualize trends over time

Application Performance Monitoring

Modern applications generate numerous metrics:

Request latency
Error rates
Resource utilization
User activity

A time series database lets you:

// Query to get 95th percentile API latency over the last 24 hours
db.query(`
  SELECT 
    time_bucket('15m', timestamp) as time_period,
    percentile_cont(0.95) WITHIN GROUP (ORDER BY response_time) as p95_latency
  FROM api_metrics
  WHERE timestamp > NOW() - INTERVAL '24 hours'
  GROUP BY time_period
  ORDER BY time_period
`)

Financial Market Analysis

Time series databases are perfect for storing market data:

// Example query to calculate a simple moving average
const result = await tsdb.query(`
  SELECT 
    date,
    symbol,
    close_price,
    AVG(close_price) OVER (
      PARTITION BY symbol 
      ORDER BY date 
      ROWS BETWEEN 9 PRECEDING AND CURRENT ROW
    ) as moving_avg_10d
  FROM stock_prices
  WHERE 
    symbol = 'AAPL' 
    AND date >= '2023-01-01'
    AND date <= '2023-12-31'
  ORDER BY date
`);

Implementation Example: Building a Simple Metrics Tracker

Let's build a simple metrics tracking system using InfluxDB:

First, let's set up the database connection:

const { InfluxDB, Point } = require('@influxdata/influxdb-client');

// Configure client
const token = 'your-influxdb-token';
const url = 'http://localhost:8086';
const org = 'your-organization';
const bucket = 'metrics';

const client = new InfluxDB({ url, token });
const writeClient = client.getWriteApi(org, bucket, 'ns');

Now, let's write some data:

// Function to record a website visit
function recordPageVisit(page, userAgent, loadTime) {
  const point = new Point('page_visit')
    .tag('page', page)
    .tag('user_agent', userAgent)
    .floatField('load_time', loadTime)
    .timestamp(new Date());
  
  writeClient.writePoint(point);
}

// Record some sample data
recordPageVisit('/home', 'Mozilla/5.0...', 0.342);
recordPageVisit('/products', 'Mozilla/5.0...', 1.271);

// Make sure to flush the data
writeClient.flush();

Now, let's query the data:

const { QueryApi } = require('@influxdata/influxdb-client');

const queryClient = client.getQueryApi(org);

// Query to get average load time per page for the last 24 hours
const query = `
  from(bucket: "metrics")
    |> range(start: -24h)
    |> filter(fn: (r) => r._measurement == "page_visit")
    |> group(columns: ["page"])
    |> mean(column: "load_time")
`;

queryClient.queryRows(query, {
  next(row, tableMeta) {
    const tableObject = tableMeta.toObject(row);
    console.log(`Page: ${tableObject.page}, Avg Load Time: ${tableObject._value}s`);
  },
  error(error) {
    console.error(error);
  },
  complete() {
    console.log('Query completed');
  }
});

When to Use a Time Series Database

Time series databases are ideal when:

You're collecting time-stamped data at regular intervals
Your workload is primarily append-only (writing new data, not updating existing data)
You need to query data across time ranges and perform time-based aggregations
Your data volume is large and growing continuously
You need specialized functions for time-based analysis

When Not to Use a Time Series Database

Time series databases might not be the best choice when:

Your data doesn't have a strong time component
You need complex relationships between entities (a graph database might be better)
You perform frequent updates to existing records
You need ACID transaction support for multiple records
Your primary access pattern is key-value lookup

Summary

Time Series Databases solve a specific problem exceptionally well: storing and analyzing data points ordered by time. They offer:

Optimized storage and retrieval of time-stamped data
High write throughput for continuous data collection
Powerful time-based querying capabilities
Automatic data lifecycle management

As the volume of time-based data continues to grow with IoT, monitoring, and analytics, time series databases have become an essential tool in the modern data storage ecosystem.

Exercises

Install InfluxDB locally and write a simple script to collect and store CPU usage metrics from your computer every 5 seconds.
Create a dashboard using Grafana to visualize the data from Exercise 1.
Design a schema for a time series database that could track:
- Stock prices for multiple companies
- Trading volume
- News sentiment scores
Research how time series databases handle out-of-order data arrivals (when timestamps aren't sequential).
Compare query performance between a relational database and a time series database for calculating hourly averages over a month of data.

Additional Resources

Key Terms

Retention policy: Rules that determine how long data is kept in the database
Downsampling: The process of reducing data resolution over time
Continuous query: Queries that run automatically at regular intervals
Rollup: Aggregation of detailed data into summary data
Cardinality: The uniqueness of data points, particularly in tag values

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What Makes a Time Series Database Different?​

Key Features of Time Series Databases​

Time-Optimized Data Model​

Data Compression​

Downsampling and Retention Policies​

Time-Based Querying​

Popular Time Series Databases​

InfluxDB​

TimescaleDB​

Prometheus​

Real-World Applications​

IoT Sensor Monitoring​

Application Performance Monitoring​

Financial Market Analysis​

Implementation Example: Building a Simple Metrics Tracker​

When to Use a Time Series Database​

When Not to Use a Time Series Database​

Summary​

Exercises​

Additional Resources​

Key Terms​