Resource Monitoring
Introduction
Resource monitoring is a critical aspect of operating system management that involves tracking, analyzing, and managing system resources to ensure optimal performance. As a beginner in programming and system administration, understanding how to monitor resources will help you identify bottlenecks, troubleshoot performance issues, and optimize your applications.
This guide will walk you through the fundamentals of resource monitoring, common tools, and practical techniques to keep your systems running smoothly.
Why Resource Monitoring Matters
Operating systems manage four primary resources:
- CPU (Central Processing Unit) - Processes instructions
- Memory (RAM) - Stores temporary data for quick access
- Disk/Storage - Persists data long-term
- Network - Facilitates data transfer between systems
Without proper monitoring, your system might experience:
- Sluggish performance
- Application crashes
- Unresponsive services
- Potential data loss
- Security vulnerabilities
Basic Resource Monitoring Concepts
Resource Utilization
Resource utilization refers to the percentage of a resource currently in use. High utilization often indicates a potential bottleneck.
Performance Metrics
Key metrics to monitor include:
- CPU: Load average, user time, system time, idle time
- Memory: Used/free RAM, swap usage, buffer/cache allocation
- Disk: I/O operations, read/write speeds, free space
- Network: Bandwidth usage, packets sent/received, latency
Thresholds and Alerts
Establishing baseline thresholds helps identify abnormal behavior:
- Baselines: Normal operating ranges for your system
- Thresholds: Trigger points for warnings or alerts
- Alerts: Notifications when thresholds are crossed
Resource Monitoring Tools
Command-line Tools
Linux/Unix Tools
1. top/htop
The top
command provides a dynamic real-time view of running processes:
$ top
Example output:
top - 14:23:36 up 7 days, 4:53, 1 user, load average: 0.84, 0.96, 0.91
Tasks: 267 total, 1 running, 266 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.9 us, 2.0 sy, 0.0 ni, 91.7 id, 0.3 wa, 0.0 hi, 0.1 si, 0.0 st
MiB Mem : 16384.0 total, 5824.1 free, 6240.3 used, 4319.6 buff/cache
MiB Swap: 8192.0 total, 8192.0 free, 0.0 used. 9125.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3423 user1 20 0 4196868 354056 112956 S 12.6 2.1 6:29.83 firefox
2137 user1 20 0 859492 195416 89560 S 6.0 1.2 18:36.35 gnome-shell
954 root 20 0 792960 90416 60448 S 2.3 0.5 7:53.21 Xorg
2. vmstat
The vmstat
command reports virtual memory statistics:
$ vmstat 5 3 # Reports every 5 seconds, 3 times
Example output:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 5896916 348272 3925728 0 0 67 38 130 282 6 2 92 0 0
0 0 0 5896660 348272 3925728 0 0 0 0 126 264 1 1 99 0 0
0 0 0 5896660 348272 3925728 0 0 0 0 125 260 0 1 99 0 0
3. iostat
The iostat
command monitors CPU and disk I/O:
$ iostat -x 2
Example output:
Linux 5.15.0-58-generic (hostname) 03/18/2025 _x86_64_ (8 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
5.90 0.00 2.01 0.30 0.00 91.79
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 8.32 2.84 259.21 116.01 0.03 1.34 0.36 32.04 0.57 2.23 0.01 31.16 40.85 0.13 0.15
4. free
The free
command displays memory usage:
$ free -h
Example output:
total used free shared buff/cache available
Mem: 16G 6.1G 5.7G 294M 4.2G 8.9G
Swap: 8G 0B 8.0G
5. netstat/ss
The netstat
command shows network connections:
$ netstat -tuln
Example output:
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 ::1:631 :::* LISTEN
udp 0 0 0.0.0.0:68 0.0.0.0:*
udp 0 0 0.0.0.0:631 0.0.0.0:*
Windows Tools
1. Task Manager
Windows Task Manager provides a GUI for monitoring resources.
Access it by:
- Pressing
Ctrl+Shift+Esc
- Right-clicking the taskbar and selecting "Task Manager"
2. Performance Monitor (perfmon)
> perfmon
3. Resource Monitor
> resmon
4. Windows PowerShell Commands
Get CPU information:
Get-WmiObject Win32_Processor | Select-Object LoadPercentage
Example output:
LoadPercentage
--------------
12
Get memory information:
Get-WmiObject Win32_OperatingSystem | Select-Object FreePhysicalMemory, TotalVisibleMemorySize
Example output:
FreePhysicalMemory TotalVisibleMemorySize
------------------ ----------------------
10485760 16777216
Monitoring Systems and Applications
Simple Monitoring Scripts
Here's a basic bash script to monitor system resources:
#!/bin/bash
echo "===== System Resource Monitor ====="
echo "Date: $(date)"
echo ""
echo "CPU Usage:"
top -bn1 | grep "Cpu(s)" | awk '{print $2 + $4 "%"}'
echo "Memory Usage:"
free -m | awk 'NR==2{printf "Used: %s MB (%.2f%%)
", $3, $3*100/$2}'
echo "Disk Usage:"
df -h | awk '$NF=="/"{printf "Used: %s (%.2f%%)
", $3, $5}'
echo "Network Stats:"
netstat -i | awk 'NR>2{print $1, $4, $8}'
Python Monitoring Script
A more advanced Python script for monitoring and logging resources:
import psutil
import time
import datetime
def monitor_resources(interval=5, duration=60):
"""
Monitor system resources for a specified duration.
Args:
interval: Time between measurements in seconds
duration: Total monitoring time in seconds
"""
measurements = []
intervals = int(duration / interval)
print(f"Monitoring system resources every {interval} seconds for {duration} seconds...")
for i in range(intervals):
timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
cpu_percent = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory()
disk = psutil.disk_usage('/')
network = psutil.net_io_counters()
print(f"
--- Measurement {i+1}/{intervals} at {timestamp} ---")
print(f"CPU Usage: {cpu_percent}%")
print(f"Memory: {memory.used/1024/1024:.2f} MB / {memory.total/1024/1024:.2f} MB ({memory.percent}%)")
print(f"Disk: {disk.used/1024/1024/1024:.2f} GB / {disk.total/1024/1024/1024:.2f} GB ({disk.percent}%)")
print(f"Network: Sent {network.bytes_sent/1024/1024:.2f} MB, Received {network.bytes_recv/1024/1024:.2f} MB")
measurements.append({
'timestamp': timestamp,
'cpu_percent': cpu_percent,
'memory_percent': memory.percent,
'disk_percent': disk.percent,
'network_sent': network.bytes_sent,
'network_recv': network.bytes_recv
})
if i < intervals - 1:
time.sleep(interval)
return measurements
if __name__ == "__main__":
# Monitor every 5 seconds for 1 minute
data = monitor_resources(interval=5, duration=60)
# You could save this data to a file or database for further analysis
print("
Monitoring complete. Summary of measurements:")
for m in data:
print(f"{m['timestamp']}: CPU {m['cpu_percent']}%, Mem {m['memory_percent']}%, Disk {m['disk_percent']}%")
To run this script, you'll need to install the psutil
library:
pip install psutil
Advanced Monitoring Techniques
Process Monitoring
Focusing on specific processes can help identify problematic applications:
# Monitor a specific process by name
$ ps aux | grep nginx
# Monitor a specific process by PID
$ top -p 1234
Log Analysis
System and application logs provide valuable insights:
# View system logs
$ journalctl -f
# View application logs (example: Apache)
$ tail -f /var/log/apache2/error.log
Load Testing
Simulate high load to identify system limitations:
# Install stress-ng tool
$ sudo apt install stress-ng
# Test CPU with 4 workers for 60 seconds
$ stress-ng --cpu 4 --timeout 60s
# Test memory with 2 workers using 1GB each
$ stress-ng --vm 2 --vm-bytes 1G --timeout 30s
Implementing Monitoring in Real-World Scenarios
Case Study: Web Server Monitoring
Let's say you're managing a web server running a popular website. Here's how to approach monitoring:
-
Set up baseline monitoring:
- CPU, memory, disk, and network usage
- Web server-specific metrics (requests/sec, response time)
-
Configure alerts:
- CPU usage > 80% for more than 5 minutes
- Less than 20% free memory
- Disk usage > 90%
- Response time > 500ms
-
Regular performance reviews:
- Weekly reports on resource usage trends
- Monthly capacity planning sessions
Practical Example: Database Server Optimization
Identifying a performance issue on a database server:
# Monitor overall system
$ top
# Notice high CPU usage by MySQL
$ ps aux | grep mysql
# Check MySQL performance
$ mysqladmin extended-status
# Identify slow queries
$ tail -f /var/log/mysql/slow-query.log
# Optimize the database
$ mysqltuner
Best Practices for Resource Monitoring
- Establish baselines - Know what "normal" looks like for your system
- Monitor proactively - Don't wait for problems to occur
- Use multiple tools - Different tools provide different insights
- Automate where possible - Set up scheduled checks and reports
- Document everything - Keep records of changes and performance metrics
- Plan for scaling - Use monitoring data to predict future resource needs
- Test monitoring systems - Ensure your monitoring tools themselves are reliable
Troubleshooting Common Resource Issues
High CPU Usage
Potential causes and solutions:
- Runaway processes - Identify with
top
and terminate if necessary - Too many processes - Review system load and consolidate services
- Inefficient code - Profile applications and optimize
- Malware/cryptominers - Scan system for unauthorized software
Memory Leaks
Detecting and addressing memory issues:
- Monitor process memory growth - Use
top
orps
withwatch
- Check for swap usage - High swap might indicate memory pressure
- Use specialized tools -
valgrind
for application memory debugging - Restart services - Sometimes necessary for services with known memory leaks
Disk I/O Bottlenecks
Improving disk performance:
- Identify high I/O processes - Use
iotop
oriostat
- Check disk health - Use SMART monitoring tools
- Consider SSD upgrade - For frequently accessed data
- Optimize database queries - Reduce unnecessary disk operations
Summary
Resource monitoring is an essential skill for anyone managing systems or developing applications. By understanding the tools and techniques covered in this guide, you'll be able to:
- Identify performance bottlenecks
- Troubleshoot resource-related issues
- Optimize system performance
- Plan for future resource needs
- Prevent outages and service disruptions
Remember that effective monitoring is an ongoing process, not a one-time task. Regular review of your monitoring data will help you maintain optimal system performance over time.
Exercises
-
Install and explore basic monitoring tools (
top
,htop
,vmstat
) on your system. Compare their outputs and features. -
Write a simple bash or Python script that reports CPU, memory, and disk usage every minute.
-
Set up a simulated high-load scenario using
stress-ng
and observe how your system responds. -
Create a monitoring dashboard using a tool like Grafana or Prometheus (for advanced users).
-
Practice analyzing system logs to identify performance issues.
Additional Resources
Here are some resources to further your knowledge:
- Linux System Administration documentation
- The
man
pages for the tools mentioned in this guide - Online tutorials on performance optimization
- Books on system administration and performance tuning
Remember, effective resource monitoring is both a science and an art. As you gain experience, you'll develop intuition for identifying and resolving performance issues efficiently.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)