Integration with OpenTelemetry
Introduction
OpenTelemetry provides a unified framework for collecting and processing telemetry data from applications and infrastructure. When integrated with Grafana, it creates a powerful observability solution that helps developers monitor, troubleshoot, and optimize their applications. This guide will walk you through understanding OpenTelemetry and how to integrate it with your Grafana ecosystem.
What is OpenTelemetry?
OpenTelemetry (often abbreviated as OTel) is an open-source observability framework that provides a collection of tools, APIs, and SDKs for collecting three primary types of telemetry data:
- Traces: Detailed records of requests as they flow through distributed systems
- Metrics: Numerical measurements collected at regular intervals
- Logs: Time-stamped records of discrete events
Why Integrate OpenTelemetry with Grafana?
Integrating OpenTelemetry with Grafana offers several benefits:
- Vendor-neutral instrumentation: Instrument your code once and send telemetry data to multiple backends
- Comprehensive observability: Collect metrics, traces, and logs using a single framework
- Wide language support: OpenTelemetry supports numerous programming languages
- Unified visualization: Use Grafana to create comprehensive dashboards combining all your telemetry data
- Simplified troubleshooting: Correlate metrics, logs, and traces in Grafana's Explore view
Setting Up OpenTelemetry for Grafana
Prerequisites
Before you begin, ensure you have:
- A running Grafana instance (version 7.0 or higher)
- Applications you want to monitor
- Basic understanding of your application architecture
Step 1: Instrument Your Application with OpenTelemetry
Let's demonstrate how to instrument a simple Node.js application:
// Import OpenTelemetry packages
const { NodeTracerProvider } = require('@opentelemetry/sdk-trace-node');
const { SimpleSpanProcessor } = require('@opentelemetry/sdk-trace-base');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');
const { trace } = require('@opentelemetry/api');
const { registerInstrumentations } = require('@opentelemetry/instrumentation');
const { ExpressInstrumentation } = require('@opentelemetry/instrumentation-express');
const { HttpInstrumentation } = require('@opentelemetry/instrumentation-http');
// Set up the tracer provider
const provider = new NodeTracerProvider({
resource: new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-service',
}),
});
// Configure the OTLP exporter
const otlpExporter = new OTLPTraceExporter({
url: 'http://localhost:4318/v1/traces', // OpenTelemetry Collector endpoint
});
// Add the exporter to the provider
provider.addSpanProcessor(new SimpleSpanProcessor(otlpExporter));
provider.register();
// Register auto-instrumentations
registerInstrumentations({
instrumentations: [
new HttpInstrumentation(),
new ExpressInstrumentation(),
],
});
// Create an Express application
const express = require('express');
const app = express();
app.get('/', (req, res) => {
// Create a custom span
const tracer = trace.getTracer('express-example');
const span = tracer.startSpan('process-request');
// Perform some work
setTimeout(() => {
span.end();
res.send('Hello from instrumented app!');
}, 100);
});
app.listen(3000, () => {
console.log('Server running on port 3000');
});
Step 2: Deploy the OpenTelemetry Collector
The OpenTelemetry Collector acts as a proxy between your applications and Grafana's backends. Create a configuration file named otel-collector-config.yaml
:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 1s
send_batch_size: 1024
exporters:
prometheus:
endpoint: 0.0.0.0:8889
namespace: otel
loki:
endpoint: "http://loki:3100/loki/api/v1/push"
tenant_id: "tenant1"
otlp:
endpoint: tempo:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [batch]
exporters: [loki]
Run the collector using Docker:
docker run -p 4317:4317 -p 4318:4318 -p 8889:8889 \
-v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
otel/opentelemetry-collector:latest
Step 3: Configure Grafana Data Sources
In your Grafana instance, you'll need to set up the appropriate data sources:
- For metrics: Add a Prometheus data source pointing to the OTel Collector's Prometheus endpoint
- For logs: Add a Loki data source
- For traces: Add a Tempo data source
Here's how to configure the Prometheus data source in Grafana:
- Navigate to Configuration > Data Sources > Add data source
- Select Prometheus
- Set the URL to
http://localhost:8889
(the OTel Collector's Prometheus endpoint) - Click "Save & Test"
Step 4: Create Dashboards in Grafana
Now you can create dashboards to visualize your OpenTelemetry data. Let's create a simple dashboard with service metrics:
- In Grafana, click on "Create" > "Dashboard"
- Add a new panel
- Use a PromQL query like
rate(otel_process_runtime_total_alloc_bytes[5m])
to visualize memory allocation rate - Save your dashboard
Using Grafana Explore with OpenTelemetry
Grafana's Explore view is perfect for troubleshooting with OpenTelemetry data:
- Go to Explore in the Grafana sidebar
- Select your Prometheus data source
- Query for a specific metric, e.g.,
otel_http_server_duration_milliseconds_bucket
- Switch to your Tempo data source to view related traces
- Use the trace ID to correlate with logs in Loki
Connecting Metrics, Logs, and Traces
One of the most powerful features when integrating OpenTelemetry with Grafana is the ability to correlate different telemetry signals:
Example: Troubleshooting High Latency
Imagine you receive an alert for high API latency:
- View the latency metrics dashboard in Grafana
- Identify the specific endpoint with high latency
- Switch to Explore and query traces for that endpoint
- Find traces with high duration
- Examine spans within those traces to identify bottlenecks
- Look at logs corresponding to the problematic service
- Fix the issue and verify improvement through Grafana
Advanced Configuration: Custom Attributes and Filtering
You can enhance your OpenTelemetry data with custom attributes that make it more useful in Grafana:
// Add custom attributes to a span
const span = tracer.startSpan('process-data');
span.setAttribute('customer.id', customerId);
span.setAttribute('process.priority', 'high');
span.setAttribute('data.size', dataSize);
span.end();
In Grafana, you can then filter or group by these custom attributes.
Best Practices for OpenTelemetry with Grafana
- Define meaningful service names: Use clear, consistent service naming conventions
- Add relevant attributes: Include attributes that will help with filtering and grouping
- Sample appropriately: Use intelligent sampling for high-volume services
- Create focused dashboards: Design dashboards around specific use cases or services
- Set up alerting: Configure alerts based on OpenTelemetry metrics
- Use exemplars: Link metrics to traces with exemplars
- Standardize instrumentation: Create reusable instrumentation libraries for your organization
Troubleshooting Your OpenTelemetry Setup
If you're not seeing data in Grafana:
- Check instrumentation: Verify your application is generating telemetry data
- Inspect the collector: Look at the collector logs for errors or dropped data
- Validate data source configuration: Ensure Grafana data sources are correctly configured
- Test with simple queries: Start with basic queries to confirm data flow
Summary
Integrating OpenTelemetry with Grafana creates a powerful observability platform that helps you understand the behavior and performance of your applications. By collecting metrics, logs, and traces through a single framework and visualizing them in Grafana, you can gain comprehensive insights into your systems.
This integration enables you to:
- Monitor application performance with detailed metrics
- Track requests across distributed systems with traces
- Correlate different telemetry signals for efficient troubleshooting
- Create comprehensive dashboards for operational visibility
Additional Resources
- OpenTelemetry Documentation
- Grafana Documentation
- OpenTelemetry Collector
- Grafana Tempo (Distributed Tracing)
Exercises
- Instrument a simple application using OpenTelemetry and visualize its metrics in Grafana
- Set up trace sampling and explore how it affects performance and data volume
- Create a Grafana dashboard that combines metrics, logs, and traces for a service
- Configure alerts based on OpenTelemetry metrics
- Practice troubleshooting a performance issue using the correlation between metrics, logs, and traces
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)