Grafana Tempo

Introduction

Grafana Tempo is a high-scale, cost-effective distributed tracing backend designed to help developers understand how requests flow through their distributed applications. As applications grow more complex with microservices architectures, tracking a single request as it moves through different services becomes challenging. Tempo solves this problem by collecting and storing distributed traces, making them searchable and visualizable within the Grafana interface.

Tempo stands out in the Grafana ecosystem for several key reasons:

Cost-efficient storage: Uses object storage (like S3, GCS, or Azure Blob Storage) to keep costs low
Seamless integration: Works perfectly with other Grafana tools like Loki and Prometheus
Multi-protocol support: Compatible with Jaeger, Zipkin, OpenTelemetry, and other tracing protocols
Easy deployment: Designed to be simple to operate at any scale

In this guide, we'll explore what Tempo is, how it works, and how to set it up and use it in your applications.

What is Distributed Tracing?

Before diving into Tempo, let's understand what distributed tracing is.

Distributed tracing is a method for tracking requests as they travel through various services in a distributed system. Each trace represents a single request's journey and consists of multiple spans. A span represents a unit of work in a service, such as an API call or a database query.

In the example above, a single user request generates multiple spans across different services. Distributed tracing helps you visualize this flow and identify performance bottlenecks or errors.

How Tempo Works

Grafana Tempo follows a straightforward architecture:

Collection: Your applications instrument code to generate traces using libraries like OpenTelemetry, Jaeger, or Zipkin
Ingestion: Tempo receives these traces through various protocols
Storage: Traces are stored in object storage for cost-effective retention
Query: Grafana provides a UI to search and visualize these traces

Setting Up Grafana Tempo

Let's go through the basic steps to set up Tempo in your environment.

Prerequisites

Docker and Docker Compose (for local testing)
Grafana (version 7.4.0 or newer)
An application instrumented for tracing

Step 1: Run Tempo Locally with Docker Compose

Create a file named docker-compose.yml:

version: '3'
services:
  tempo:
    image: grafana/tempo:latest
    command: ["-config.file=/etc/tempo.yaml"]
    volumes:
      - ./tempo.yaml:/etc/tempo.yaml
      - ./tempo-data:/tmp/tempo
    ports:
      - "3200:3200"  # tempo
      - "4317:4317"  # otlp grpc
      - "4318:4318"  # otlp http
      - "9411:9411"  # zipkin

  grafana:
    image: grafana/grafana:latest
    volumes:
      - ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml
    environment:
      - GF_AUTH_ANONYMOUS_ENABLED=true
      - GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
    ports:
      - "3000:3000"

Step 2: Configure Tempo

Create a file named tempo.yaml:

server:
  http_listen_port: 3200

distributor:
  receivers:
    jaeger:
      protocols:
        thrift_http:
          endpoint: 0.0.0.0:14268
    zipkin:
      endpoint: 0.0.0.0:9411
    otlp:
      protocols:
        http:
          endpoint: 0.0.0.0:4318
        grpc:
          endpoint: 0.0.0.0:4317

storage:
  trace:
    backend: local
    local:
      path: /tmp/tempo
    pool:
      max_workers: 100
      queue_depth: 10000

compactor:
  compaction:
    block_retention: 24h

Step 3: Configure Grafana Datasource

Create a file named grafana-datasources.yaml:

apiVersion: 1

datasources:
  - name: Tempo
    type: tempo
    access: proxy
    url: http://tempo:3200
    version: 1

Step 4: Start the Services

Run the following command to start both Tempo and Grafana:

docker-compose up -d

Now you can access Grafana at http://localhost:3000 and Tempo will be available as a datasource.

Instrumenting Your Application for Tracing

To get the most out of Tempo, you need to instrument your applications to generate traces. Here's a simple example using OpenTelemetry with Node.js:

First, install the required packages:

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node @opentelemetry/exporter-trace-otlp-http

Then, create a file named tracing.js:

// tracing.js
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');

const exporter = new OTLPTraceExporter({
  url: 'http://localhost:4318/v1/traces',
});

const sdk = new NodeSDK({
  traceExporter: exporter,
  instrumentations: [getNodeAutoInstrumentations()]
});

sdk.start();

Now, import this file at the beginning of your application:

// app.js
require('./tracing'); // Initialize tracing

const express = require('express');
const app = express();

app.get('/', (req, res) => {
  res.send('Hello World!');
});

app.get('/slow', async (req, res) => {
  // Simulate a slow operation
  await new Promise(resolve => setTimeout(resolve, 2000));
  res.send('Slow operation completed');
});

app.listen(3001, () => {
  console.log('Example app listening on port 3001');
});

When you make requests to this application, traces will be sent to Tempo and can be viewed in Grafana.

Exploring Traces in Grafana

Once your application is generating traces and sending them to Tempo, you can explore them in Grafana:

Open Grafana (http://localhost:3000)
Navigate to Explore (from the left sidebar)
Select Tempo as your data source
Use the search functionality to find traces:
- By trace ID (if you know it)
- By service name
- By duration (to find slow traces)
- By tags/attributes

Trace View

The trace view in Grafana displays a waterfall chart showing the timeline of spans within a trace:

Each span shows its duration, service, and operation
Spans are nested to show parent-child relationships
Colors indicate different services
You can expand spans to see more details, like tags and logs

This visualization helps you understand how long each part of your request takes and identify bottlenecks.

Advanced Features

Trace Discovery with Exemplars

One challenge with distributed tracing is finding relevant traces. Tempo integrates with Prometheus and Loki through "exemplars" - links from metrics or logs directly to related traces:

From a Prometheus graph, click on a point with an exemplar to see related traces
From Loki logs, click on trace ID links to jump directly to traces

This feature creates a seamless observability experience across metrics, logs, and traces.

TraceQL - Tempo's Query Language

Tempo provides TraceQL, a powerful query language for searching traces:

{ .service.name = "payment-service" } | { .status = "error" }

This example finds traces that went through the payment service and had an error status.

Metrics Generator

Tempo can generate metrics from your traces, allowing you to:

Create service graphs showing traffic between services
Calculate RED metrics (Rate, Error, Duration) automatically
Monitor span metrics without additional instrumentation

Real-World Examples

Example 1: Troubleshooting Latency Issues

Imagine your e-commerce application is experiencing slow checkout times. Here's how you might use Tempo to investigate:

Look at Prometheus metrics to identify when latency started increasing
Use exemplars to find traces during that period
Analyze traces to see which service is taking the most time
Drill down into that service's spans to understand the root cause
Fix the issue and verify with new traces

Example 2: Error Investigation

When users report payment failures:

Search for traces with error status in the payment service
Examine the spans to identify where errors occur
Check span attributes for error messages and relevant context
Look at spans from dependent services that might be causing the issue
Implement fixes and verify with new traces

Best Practices

When working with Grafana Tempo:

Use consistent naming conventions for services and operations
Add meaningful attributes to spans to provide context
Sample intelligently - you don't need every trace in high-volume systems
Set appropriate retention periods based on your storage capacity
Integrate with Prometheus and Loki for a complete observability solution
Automate instrumentation where possible to reduce manual work

Summary

Grafana Tempo is a powerful tool for distributed tracing that helps you understand the flow of requests through your system. Its integration with the Grafana ecosystem, cost-effective storage model, and ease of use make it an excellent choice for organizations of all sizes.

In this guide, we've covered:

What Tempo is and how it fits into the Grafana ecosystem
The basics of distributed tracing
How to set up Tempo and instrument applications
Exploring traces in Grafana
Advanced features like exemplars and TraceQL
Real-world troubleshooting examples

Additional Resources

Exercises

To strengthen your understanding of Tempo:

Set up Tempo locally using the instructions in this guide
Instrument a simple application with OpenTelemetry and send traces to Tempo
Create a multi-service application and trace requests across services
Practice using TraceQL to search for specific traces
Try setting up exemplars from Prometheus to Tempo
Implement span metrics generation and create a dashboard showing service performance

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

What is Distributed Tracing?​

How Tempo Works​

Setting Up Grafana Tempo​

Prerequisites​

Step 1: Run Tempo Locally with Docker Compose​

Step 2: Configure Tempo​

Step 3: Configure Grafana Datasource​

Step 4: Start the Services​

Instrumenting Your Application for Tracing​

Exploring Traces in Grafana​

Trace View​

Advanced Features​

Trace Discovery with Exemplars​

TraceQL - Tempo's Query Language​

Metrics Generator​

Real-World Examples​

Example 1: Troubleshooting Latency Issues​

Example 2: Error Investigation​

Best Practices​

Summary​

Additional Resources​

Exercises​