Ubuntu Cloud Monitoring

Introduction

Monitoring is a critical aspect of maintaining healthy and efficient cloud environments. For Ubuntu-based cloud deployments, proper monitoring ensures optimal performance, enhances security, and helps prevent downtime. This guide explores the fundamentals of Ubuntu cloud monitoring, essential tools, implementation strategies, and best practices to help you build a robust monitoring system for your cloud infrastructure.

Why Monitor Your Ubuntu Cloud Environment?

Cloud monitoring provides visibility into your infrastructure and applications, allowing you to:

Detect and resolve issues before they impact users
Optimize resource utilization and reduce costs
Ensure compliance with service level agreements (SLAs)
Identify security threats and vulnerabilities
Make data-driven decisions about scaling and improvements

Key Monitoring Components for Ubuntu Cloud

System-Level Metrics

At the foundation of cloud monitoring are system-level metrics that provide insight into the health and performance of your Ubuntu instances:

Metric Type	Examples	Importance
CPU	Usage percentage, load average, context switches	Indicates processing capacity and potential bottlenecks
Memory	Free/used RAM, swap usage, page faults	Helps identify memory leaks and capacity issues
Disk	I/O operations, free space, read/write latency	Prevents storage-related failures
Network	Bandwidth usage, packet loss, connection count	Monitors connectivity and data transfer efficiency

Application-Level Monitoring

Beyond system metrics, monitoring the applications running on your Ubuntu cloud instances is essential:

Response times: How long your applications take to process requests
Error rates: Frequency and types of errors occurring in your applications
Request rates: Volume of traffic your applications are handling
Saturation: How "full" your application service is

Log Management

Logs provide detailed information about events occurring within your Ubuntu cloud environment:

System logs (/var/log/syslog, /var/log/auth.log, etc.)
Application logs
Service-specific logs (web servers, databases, etc.)
Cloud provider logs

Essential Ubuntu Cloud Monitoring Tools

1. Prometheus

Prometheus is a powerful open-source monitoring system with a dimensional data model and flexible query language.

Installation on Ubuntu

# Update package information
sudo apt update

# Install dependencies
sudo apt install -y apt-transport-https software-properties-common

# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.42.0/prometheus-2.42.0.linux-amd64.tar.gz

# Extract the archive
tar xvf prometheus-2.42.0.linux-amd64.tar.gz

# Move to /opt directory
sudo mv prometheus-2.42.0.linux-amd64 /opt/prometheus

# Create a Prometheus user
sudo useradd -rs /bin/false prometheus

# Create directories for configuration and data
sudo mkdir -p /etc/prometheus /var/lib/prometheus

# Set ownership
sudo chown -R prometheus:prometheus /opt/prometheus /etc/prometheus /var/lib/prometheus

Basic Configuration

Create a configuration file at /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']

Create a systemd service

sudo tee /etc/systemd/system/prometheus.service > /dev/null << EOF
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/opt/prometheus/prometheus \
  --config.file=/etc/prometheus/prometheus.yml \
  --storage.tsdb.path=/var/lib/prometheus/ \
  --web.console.templates=/opt/prometheus/consoles \
  --web.console.libraries=/opt/prometheus/console_libraries

[Install]
WantedBy=multi-user.target
EOF

Start and enable the service:

sudo systemctl daemon-reload
sudo systemctl start prometheus
sudo systemctl enable prometheus

2. Node Exporter

Node Exporter collects system-level metrics from Ubuntu hosts.

# Download Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz

# Extract the archive
tar xvf node_exporter-1.5.0.linux-amd64.tar.gz

# Move the binary
sudo mv node_exporter-1.5.0.linux-amd64/node_exporter /usr/local/bin/

# Create a Node Exporter user
sudo useradd -rs /bin/false node_exporter

# Create a systemd service
sudo tee /etc/systemd/system/node_exporter.service > /dev/null << EOF
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

# Start and enable the service
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

3. Grafana

Grafana provides visualization for your monitoring data with dashboards and alerts.

# Add Grafana APT repository
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -

# Update packages and install Grafana
sudo apt-get update
sudo apt-get install -y grafana

# Start and enable Grafana service
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

After installation, access Grafana at http://your-server-ip:3000 (default credentials: admin/admin).

4. Netdata

Netdata provides real-time monitoring with minimal configuration:

# Install dependencies
sudo apt-get install -y curl

# Download and run the installation script
bash <(curl -Ss https://my-netdata.io/kickstart.sh)

Access Netdata dashboard at http://your-server-ip:19999.

Setting Up a Complete Monitoring Stack

The following example demonstrates how to set up a monitoring stack for multiple Ubuntu cloud instances:

Step 1: Install Monitoring Agents

Deploy Node Exporter to all Ubuntu instances:

#!/bin/bash
# deploy-node-exporter.sh

# Install Node Exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz
tar xvf node_exporter-1.5.0.linux-amd64.tar.gz
sudo mv node_exporter-1.5.0.linux-amd64/node_exporter /usr/local/bin/
sudo useradd -rs /bin/false node_exporter

# Create systemd service
sudo tee /etc/systemd/system/node_exporter.service > /dev/null << EOF
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target
EOF

# Start and enable service
sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter

# Open firewall port for Prometheus scraping
sudo ufw allow from prometheus-server-ip to any port 9100

Step 2: Configure Prometheus

Update the Prometheus configuration to scrape metrics from all instances:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'ubuntu_nodes'
    static_configs:
      - targets:
        - 'instance-1:9100'
        - 'instance-2:9100'
        - 'instance-3:9100'
        # Add more instances as needed

Step 3: Configure Alerting

Create an alert rules file at /etc/prometheus/alert_rules.yml:

groups:
- name: ubuntu_cloud_alerts
  rules:
  - alert: HighCPULoad
    expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU load (instance {{ $labels.instance }})"
      description: "CPU load is > 80% for 5 minutes
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"

  - alert: HighMemoryUsage
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High memory usage (instance {{ $labels.instance }})"
      description: "Memory usage is > 85% for 5 minutes
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"

  - alert: LowDiskSpace
    expr: node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100 < 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Low disk space (instance {{ $labels.instance }})"
      description: "Disk space is < 10% for 5 minutes
  VALUE = {{ $value }}
  LABELS: {{ $labels }}"

Step 4: Set Up Alert Manager

Install and configure Alert Manager:

# Download Alert Manager
wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz

# Extract the archive
tar xvf alertmanager-0.25.0.linux-amd64.tar.gz

# Move to /opt directory
sudo mv alertmanager-0.25.0.linux-amd64 /opt/alertmanager

# Create user and set permissions
sudo useradd -rs /bin/false alertmanager
sudo chown -R alertmanager:alertmanager /opt/alertmanager

Create a configuration file at /opt/alertmanager/alertmanager.yml:

global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: '[email protected]'
  smtp_auth_username: 'your-username'
  smtp_auth_password: 'your-password'

route:
  group_by: ['alertname', 'instance']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email-notifications'

receivers:
- name: 'email-notifications'
  email_configs:
  - to: '[email protected]'
    send_resolved: true

inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'instance']

Create a systemd service for Alert Manager:

sudo tee /etc/systemd/system/alertmanager.service > /dev/null << EOF
[Unit]
Description=Alert Manager
Wants=network-online.target
After=network-online.target

[Service]
User=alertmanager
Group=alertmanager
Type=simple
ExecStart=/opt/alertmanager/alertmanager \
  --config.file=/opt/alertmanager/alertmanager.yml \
  --storage.path=/opt/alertmanager/data

[Install]
WantedBy=multi-user.target
EOF

Start and enable the Alert Manager service:

sudo systemctl daemon-reload
sudo systemctl start alertmanager
sudo systemctl enable alertmanager

Step 5: Configure Grafana Dashboards

After setting up Grafana, add Prometheus as a data source:

Navigate to Configuration > Data Sources
Click "Add data source"
Select "Prometheus"
Set the URL to http://localhost:9090 (or your Prometheus server address)
Click "Save & Test"

Import pre-built dashboards for Ubuntu monitoring:

Go to Create > Import
Enter dashboard ID 1860 (Node Exporter Full)
Select your Prometheus data source
Click "Import"

Monitoring Ubuntu Cloud Resources with Cloud-Native Tools

When your Ubuntu instances are running in public cloud environments, leverage cloud-native monitoring tools:

AWS CloudWatch Integration

For Ubuntu instances on AWS, install the CloudWatch agent:

# Install CloudWatch agent
wget https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb
sudo dpkg -i amazon-cloudwatch-agent.deb

# Configure the agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-config-wizard

Sample configuration file for /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json:

{
  "agent": {
    "metrics_collection_interval": 60,
    "run_as_user": "cwagent"
  },
  "metrics": {
    "metrics_collected": {
      "cpu": {
        "measurement": [
          "cpu_usage_idle",
          "cpu_usage_user",
          "cpu_usage_system"
        ],
        "resources": [
          "*"
        ],
        "totalcpu": true
      },
      "disk": {
        "measurement": [
          "used_percent",
          "inodes_free"
        ],
        "resources": [
          "/"
        ]
      },
      "diskio": {
        "measurement": [
          "io_time",
          "write_bytes",
          "read_bytes",
          "writes",
          "reads"
        ],
        "resources": [
          "*"
        ]
      },
      "mem": {
        "measurement": [
          "mem_used_percent"
        ]
      },
      "netstat": {
        "measurement": [
          "tcp_established",
          "tcp_time_wait"
        ]
      },
      "swap": {
        "measurement": [
          "swap_used_percent"
        ]
      }
    }
  },
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [
          {
            "file_path": "/var/log/syslog",
            "log_group_name": "syslog",
            "log_stream_name": "{instance_id}"
          },
          {
            "file_path": "/var/log/auth.log",
            "log_group_name": "auth.log",
            "log_stream_name": "{instance_id}"
          }
        ]
      }
    }
  }
}

Start the CloudWatch agent:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json

Azure Monitor Integration

For Ubuntu instances on Azure, install the Log Analytics agent:

# Download and install the Log Analytics agent
wget https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh
chmod +x onboard_agent.sh
sudo ./onboard_agent.sh -w <YOUR_WORKSPACE_ID> -s <YOUR_WORKSPACE_KEY>

Google Cloud Monitoring

For Ubuntu on Google Cloud, install the Ops Agent:

# Add the Cloud Monitoring agent repository
curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
sudo bash add-google-cloud-ops-agent-repo.sh --also-install

Best Practices for Ubuntu Cloud Monitoring

1. Define Clear Monitoring Objectives

Before implementing monitoring, define what you need to monitor:

Identify critical services and applications
Determine key performance indicators (KPIs)
Set thresholds for alerts based on historical data
Document monitoring requirements

2. Implement the USE Method

The USE method helps identify performance issues by focusing on:

Utilization: Percentage of time the resource is busy
Saturation: Amount of work a resource has queued
Errors: Count of error events

Apply this method to CPU, memory, disk, and network resources.

3. Implement the RED Method

For monitoring services, follow the RED method:

Rate: Requests per second
Errors: Failed requests per second
Duration: Distribution of request latencies

4. Establish a Monitoring Hierarchy

Create a hierarchical monitoring structure:

5. Implement Alert Fatigue Prevention

To prevent alert fatigue:

Define alert severity levels (critical, warning, info)
Implement alert throttling and grouping
Use progressive alerting (escalate alerts over time)
Regularly review and tune alert thresholds

6. Implement Log Rotation

Configure log rotation to prevent disk space issues:

sudo tee /etc/logrotate.d/custom-logs > /dev/null << EOF
/var/log/application/*.log {
    daily
    missingok
    rotate 14
    compress
    delaycompress
    notifempty
    create 0640 www-data www-data
    sharedscripts
    postrotate
        systemctl reload application.service > /dev/null 2>/dev/null || true
    endscript
}
EOF

7. Implement Centralized Logging

Use tools like Elasticsearch, Logstash, and Kibana (ELK stack) for centralized logging:

# Install Filebeat for log shipping
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-8.5.3-amd64.deb
sudo dpkg -i filebeat-8.5.3-amd64.deb

# Configure Filebeat
sudo tee /etc/filebeat/filebeat.yml > /dev/null << EOF
filebeat.inputs:
- type: log
  enabled: true
  paths:
    - /var/log/*.log
    - /var/log/syslog
    - /var/log/auth.log

output.elasticsearch:
  hosts: ["elasticsearch-host:9200"]
  
setup.kibana:
  host: "kibana-host:5601"
EOF

# Start and enable Filebeat
sudo systemctl enable filebeat
sudo systemctl start filebeat

Practical Example: Complete Monitoring Setup for a Web Application

Let's implement a comprehensive monitoring solution for a web application running on Ubuntu:

Step 1: System and Service Monitoring

Install Node Exporter on all web servers and database servers following the instructions provided earlier.

Step 2: Application-Specific Monitoring

For a Node.js application, use Prometheus client library:

// app.js - Node.js application with Prometheus monitoring
const express = require('express');
const client = require('prom-client');
const app = express();

// Create a Registry to register metrics
const register = new client.Registry();

// Enable default metrics collection
const collectDefaultMetrics = client.collectDefaultMetrics;
collectDefaultMetrics({ register });

// Create custom metrics
const httpRequestDurationMicroseconds = new client.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
});

// Register the custom metrics
register.registerMetric(httpRequestDurationMicroseconds);

// Middleware to measure request duration
app.use((req, res, next) => {
  const end = httpRequestDurationMicroseconds.startTimer();
  res.on('finish', () => {
    end({ method: req.method, route: req.route?.path || req.path, status_code: res.statusCode });
  });
  next();
});

// Endpoint for Prometheus to scrape metrics
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', register.contentType);
  res.end(await register.metrics());
});

// Application routes
app.get('/', (req, res) => {
  res.send('Hello World!');
});

app.listen(3000, () => {
  console.log('Server running on port 3000');
});

Step 3: Database Monitoring

Install the PostgreSQL Exporter for database monitoring:

# Download PostgreSQL Exporter
wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.11.1/postgres_exporter-0.11.1.linux-amd64.tar.gz

# Extract the archive
tar xvf postgres_exporter-0.11.1.linux-amd64.tar.gz

# Move binary to /usr/local/bin
sudo mv postgres_exporter-0.11.1.linux-amd64/postgres_exporter /usr/local/bin/

# Create dedicated PostgreSQL user for monitoring
sudo -u postgres psql -c "CREATE USER postgres_exporter WITH PASSWORD 'password';"
sudo -u postgres psql -c "GRANT pg_monitor TO postgres_exporter;"

# Create systemd service
sudo tee /etc/systemd/system/postgres_exporter.service > /dev/null << EOF
[Unit]
Description=PostgreSQL Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=postgres
Group=postgres
Type=simple
Environment="DATA_SOURCE_NAME=postgresql://postgres_exporter:password@localhost:5432/postgres?sslmode=disable"
ExecStart=/usr/local/bin/postgres_exporter

[Install]
WantedBy=multi-user.target
EOF

# Start and enable service
sudo systemctl daemon-reload
sudo systemctl start postgres_exporter
sudo systemctl enable postgres_exporter

Step 4: Update Prometheus Configuration

Update Prometheus configuration to scrape metrics from all components:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'web_servers'
    static_configs:
      - targets:
        - 'web-server-1:9100'  # Node Exporter
        - 'web-server-2:9100'
    
  - job_name: 'db_servers'
    static_configs:
      - targets:
        - 'db-server:9100'     # Node Exporter
        - 'db-server:9187'     # PostgreSQL Exporter
  
  - job_name: 'nodejs_app'
    static_configs:
      - targets:
        - 'web-server-1:3000'  # Node.js app metrics
        - 'web-server-2:3000'

Step 5: Create Grafana Dashboards

Create custom dashboards for different aspects of your application:

System Overview Dashboard
Web Application Performance Dashboard
Database Performance Dashboard
Business Metrics Dashboard

Troubleshooting Common Monitoring Issues

Problem: High CPU Usage on Monitored Nodes

Solution:

Identify the process consuming CPU:

sudo apt install htop
htop -u --sort-key=PERCENT_CPU

Check if it's related to the monitoring agent:
```
ps aux | grep node_exporter
```
Adjust scrape interval in Prometheus if needed.

Problem: Missing Data in Grafana

Solution:

Check if Prometheus is scraping targets:

curl http://localhost:9090/api/v1/targets

Verify firewall rules:
```
sudo ufw status
```
Check for network connectivity issues:
```
telnet target-host 9100
```

Problem: Disk Space Filling Up with Logs

Solution:

Identify large log files:

sudo du -h /var/log/ | sort -hr | head -10

Configure log rotation as shown earlier
Consider using logrotate's maxsize directive:
```
maxsize 100M
```

Summary

Effective monitoring is essential for maintaining reliable and efficient Ubuntu cloud environments. By implementing the tools and practices covered in this guide, you can:

Gain visibility into system and application performance
Detect and resolve issues before they affect users
Optimize resource utilization and reduce costs
Ensure security and compliance

Remember that monitoring is not a one-time setup but an ongoing process that requires regular review and adjustment as your infrastructure evolves.

Additional Resources

Exercises

Set up a basic Prometheus and Node Exporter monitoring system on an Ubuntu VM.
Create a custom Grafana dashboard showing system metrics.
Configure alerts for high CPU, memory, and disk usage.
Implement application-level monitoring for a web application of your choice.
Set up centralized logging with the ELK stack.

💡 Found a typo or mistake? Click "Edit this page" to suggest a correction. Your feedback is greatly appreciated!

Introduction​

Why Monitor Your Ubuntu Cloud Environment?​

Key Monitoring Components for Ubuntu Cloud​

System-Level Metrics​

Application-Level Monitoring​

Log Management​

Essential Ubuntu Cloud Monitoring Tools​

1. Prometheus​

Installation on Ubuntu​

Basic Configuration​

Create a systemd service​

2. Node Exporter​

3. Grafana​

4. Netdata​

Setting Up a Complete Monitoring Stack​

Step 1: Install Monitoring Agents​

Step 2: Configure Prometheus​

Step 3: Configure Alerting​

Step 4: Set Up Alert Manager​

Step 5: Configure Grafana Dashboards​

Monitoring Ubuntu Cloud Resources with Cloud-Native Tools​

AWS CloudWatch Integration​

Azure Monitor Integration​

Google Cloud Monitoring​

Best Practices for Ubuntu Cloud Monitoring​

1. Define Clear Monitoring Objectives​

2. Implement the USE Method​

3. Implement the RED Method​

4. Establish a Monitoring Hierarchy​

5. Implement Alert Fatigue Prevention​

6. Implement Log Rotation​

7. Implement Centralized Logging​

Practical Example: Complete Monitoring Setup for a Web Application​

Step 1: System and Service Monitoring​

Step 2: Application-Specific Monitoring​

Step 3: Database Monitoring​

Step 4: Update Prometheus Configuration​

Step 5: Create Grafana Dashboards​

Troubleshooting Common Monitoring Issues​

Problem: High CPU Usage on Monitored Nodes​

Problem: Missing Data in Grafana​

Problem: Disk Space Filling Up with Logs​

Summary​

Additional Resources​

Exercises​

Introduction

Why Monitor Your Ubuntu Cloud Environment?

Key Monitoring Components for Ubuntu Cloud

System-Level Metrics

Application-Level Monitoring

Log Management

Essential Ubuntu Cloud Monitoring Tools

1. Prometheus

Installation on Ubuntu

Basic Configuration

Create a systemd service

2. Node Exporter

3. Grafana

4. Netdata

Setting Up a Complete Monitoring Stack

Step 1: Install Monitoring Agents

Step 2: Configure Prometheus

Step 3: Configure Alerting

Step 4: Set Up Alert Manager

Step 5: Configure Grafana Dashboards

Monitoring Ubuntu Cloud Resources with Cloud-Native Tools

AWS CloudWatch Integration

Azure Monitor Integration

Google Cloud Monitoring

Best Practices for Ubuntu Cloud Monitoring

1. Define Clear Monitoring Objectives

2. Implement the USE Method

3. Implement the RED Method

4. Establish a Monitoring Hierarchy

5. Implement Alert Fatigue Prevention

6. Implement Log Rotation

7. Implement Centralized Logging

Practical Example: Complete Monitoring Setup for a Web Application

Step 1: System and Service Monitoring

Step 2: Application-Specific Monitoring

Step 3: Database Monitoring

Step 4: Update Prometheus Configuration

Step 5: Create Grafana Dashboards

Troubleshooting Common Monitoring Issues

Problem: High CPU Usage on Monitored Nodes

Problem: Missing Data in Grafana

Problem: Disk Space Filling Up with Logs

Summary

Additional Resources

Exercises