Application Monitoring

Introduction

Application monitoring is a critical practice in modern software development that involves tracking and analyzing the performance, health, and behavior of your applications. With Prometheus, you can implement powerful application monitoring to detect issues early, optimize performance, and understand user behavior patterns.

In this guide, we'll explore how Prometheus can be used to monitor applications, the instrumentation process, and how to extract meaningful insights from your application metrics.

Why Monitor Applications with Prometheus?

Application monitoring with Prometheus offers several key advantages:

Real-time visibility: Get immediate insights into how your applications are performing
Proactive issue detection: Identify problems before they affect users
Performance optimization: Discover bottlenecks and optimization opportunities
Business insights: Understand usage patterns and feature adoption
Data-driven decisions: Base technical and product decisions on actual metrics

Understanding Application Instrumentation

Instrumentation is the process of adding code to your application that exposes metrics for collection by monitoring systems like Prometheus.

Types of Application Metrics

When monitoring applications, you typically collect four types of metrics:

Counters: Cumulative metrics that only increase (e.g., number of requests processed)
Gauges: Metrics that can increase or decrease (e.g., current memory usage)
Histograms: Sample observations distributed into configurable buckets (e.g., request duration)
Summaries: Similar to histograms but calculate percentiles server-side

Instrumenting Your Application for Prometheus

Let's look at how to instrument applications in different languages:

Node.js Application Example

First, install the Prometheus client library:

npm install prom-client

Then, implement instrumentation in your application:

const express = require('express');
const promClient = require('prom-client');

// Create a Registry to register metrics
const register = new promClient.Registry();
promClient.collectDefaultMetrics({ register });

// Create custom metrics
const httpRequestCounter = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
  registers: [register]
});

const httpRequestDuration = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10],
  registers: [register]
});

const app = express();

// Middleware to track requests
app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    // Increment the request counter
    httpRequestCounter.inc({
      method: req.method,
      route: req.route ? req.route.path : req.path,
      status_code: res.statusCode
    });
    
    // Observe request duration
    const duration = (Date.now() - start) / 1000;
    httpRequestDuration.observe(
      {
        method: req.method,
        route: req.route ? req.route.path : req.path,
        status_code: res.statusCode
      },
      duration
    );
  });
  
  next();
});

// Expose metrics endpoint for Prometheus scraping
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

app.get('/', (req, res) => {
  res.send('Hello World!');
});

app.listen(3000, () => {
  console.log('Server is running on port 3000');
});

Java Application Example with Spring Boot

For Spring Boot applications, you can use the Micrometer library with Prometheus support:

First, add the dependencies to your pom.xml:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Configure your application.properties or application.yml:

# Enable Prometheus endpoint
management.endpoints.web.exposure.include=prometheus,health,info
management.endpoint.prometheus.enabled=true

Then create a custom metric in your service:

import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.stereotype.Service;

@Service
public class UserService {
    private final Counter userRegistrationCounter;
    
    public UserService(MeterRegistry registry) {
        this.userRegistrationCounter = Counter.builder("app_user_registrations_total")
            .description("Total number of user registrations")
            .register(registry);
    }
    
    public void registerUser(User user) {
        // User registration logic
        
        // Increment the counter
        userRegistrationCounter.increment();
    }
}

Prometheus Configuration for Application Monitoring

Once your application is instrumented, configure Prometheus to scrape the metrics:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'my-app'
    metrics_path: '/metrics'
    static_configs:
      - targets: ['localhost:3000']
  
  - job_name: 'spring-boot-app'
    metrics_path: '/actuator/prometheus'
    static_configs:
      - targets: ['localhost:8080']

Designing Effective Application Metrics

When designing metrics for your application, consider these best practices:

Use clear naming conventions: Follow the format namespace_subsystem_name_unit
Label wisely: Use labels to differentiate metrics but avoid high cardinality
Focus on what matters: Monitor what impacts users and business outcomes
Include all request outcomes: Track errors, not just successes
Avoid exposing sensitive data: Never include personal data or secrets in metrics

Visualizing Application Metrics with Grafana

While Prometheus has a basic UI, Grafana provides better visualization capabilities:

A typical Grafana dashboard for application monitoring might include:

Request rate, errors, and duration (RED metrics)
Resource utilization (CPU, memory)
Business metrics (user signups, transactions)
System-level metrics (garbage collection, thread count)

Common Application Monitoring Patterns

The RED Pattern

The RED pattern focuses on three key metrics:

Rate: Requests per second
Errors: Failed requests per second
Duration: Distribution of request latencies

Example PromQL queries for RED metrics:

# Rate of requests
sum(rate(http_requests_total[5m])) by (service)

# Error rate
sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (service) / sum(rate(http_requests_total[5m])) by (service)

# 95th percentile latency
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (service, le))

The USE Pattern

The USE pattern focuses on resources:

Utilization: Percentage of time the resource is busy
Saturation: Amount of work the resource has to do
Errors: Count of error events

Setting Up Alerts for Application Monitoring

Create alert rules for application issues:

groups:
- name: application-alerts
  rules:
  - alert: HighErrorRate
    expr: sum(rate(http_requests_total{status_code=~"5.."}[5m])) by (service) / sum(rate(http_requests_total[5m])) by (service) > 0.05
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High error rate on {{ $labels.service }}"
      description: "Service {{ $labels.service }} has error rate above 5% (current value: {{ $value }})"

  - alert: SlowResponses
    expr: histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (service, le)) > 1
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Slow responses on {{ $labels.service }}"
      description: "Service {{ $labels.service }} has 95th percentile latency above 1s (current value: {{ $value }}s)"

Real-World Case Study: E-commerce Application Monitoring

Let's consider an e-commerce application with these components:

Frontend service
Product catalog service
Cart service
Payment service
Order service

Key metrics to monitor:

Business Metrics:

Product page views
Add-to-cart actions
Checkout starts
Completed purchases
Cart abandonment rate

Technical Metrics:

API endpoint latency
Database query time
Error rates by service
Dependency health

Example dashboard structure:

Overview Dashboard: High-level health of all services
User Journey Dashboard: Conversion funnel metrics
Service-Specific Dashboards: Detailed metrics for each service
Infrastructure Dashboard: Underlying resource utilization

Troubleshooting Common Issues

When troubleshooting application problems with Prometheus:

High Latency:
- Check for slow database queries
- Look for resource saturation
- Examine external dependency performance
Error Spikes:
- Check recent deployments
- Look for dependency failures
- Examine logs alongside metrics
Memory Leaks:
- Track memory usage over time
- Monitor garbage collection metrics
- Look for increasing resource usage without corresponding traffic increase

Summary

Application monitoring with Prometheus provides critical insights into your application's health, performance, and behavior. By properly instrumenting your code, configuring Prometheus, and setting up useful dashboards and alerts, you can:

Detect and resolve issues quickly
Optimize application performance
Make data-driven decisions about improvements
Understand how users interact with your application

Remember that effective monitoring is an ongoing process that evolves with your application, and the metrics you collect should always align with your business and technical objectives.

Exercises

Instrument a simple application in your preferred language to expose Prometheus metrics
Configure Prometheus to scrape your application
Create a Grafana dashboard showing the RED metrics for your application
Set up an alert for high error rates
Add a custom business metric relevant to your application domain

Additional Resources

If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)

Introduction​

Why Monitor Applications with Prometheus?​

Understanding Application Instrumentation​

Types of Application Metrics​

Instrumenting Your Application for Prometheus​

Node.js Application Example​

Java Application Example with Spring Boot​

Prometheus Configuration for Application Monitoring​

Designing Effective Application Metrics​

Visualizing Application Metrics with Grafana​

Common Application Monitoring Patterns​

The RED Pattern​

The USE Pattern​

Setting Up Alerts for Application Monitoring​

Real-World Case Study: E-commerce Application Monitoring​

Troubleshooting Common Issues​

Summary​

Exercises​

Additional Resources​