Linux Performance Monitoring: A Complete Guide to Essential Tools

Introduction to Linux Performance Monitoring

Linux performance monitoring is not just about checking CPU load. It's a comprehensive analysis of the system: processor, memory, disks, network, and I/O. Understanding metrics helps prevent downtime, optimize resource costs, and quickly respond to anomalies.

In this guide, you'll master both basic utilities and advanced tools. We'll focus on practical scenarios: how to find a "hot" process, why a disk is slow, why the network is overloaded. All commands work on most distributions (Ubuntu, CentOS, Debian, Fedora).

Basic Utilities for Daily Use

`top` and `htop`: Interactive Process Monitoring

top is your first tool when analyzing. Run it and study the screen:

top

Key lines:

%Cpu(s): breakdown into us (user processes), sy (system), id (idle).
KiB Mem: RAM usage: used, free, buff/cache.
KiB Swap: swap activity.

Sorting: press P (by CPU), M (by memory). To see all processes, including threads, add -H at startup: top -H.

Tip: htop is an improved version with colors, a process tree, and convenient management. Install it via sudo apt install htop or sudo yum install htop.

`vmstat`: Virtual Statistics

vmstat provides a system summary every N seconds. Ideal for a quick "health check".

vmstat 2

Example output:

procs -----------memory---------- ---swap-- -----io------ -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0 123456  78900 456789    0    0   100   200  123  456 30 10 55  5  0

Decoding:

r: processes in the run queue. Value > number of cores indicates CPU shortage.
si/so: pages moved in/out of swap. Non-zero values indicate insufficient RAM.
us/sy: high values (>80%) indicate CPU load.
wa: time spent waiting for I/O. High wa (e.g., >20%) indicates a disk problem.

`iostat`: Disk and CPU Details

Install the sysstat package if you haven't already. Command:

iostat -x 2

Key metrics for disks (Device):

%util: percentage of time the disk is busy with operations. Close to 100% means the disk is overloaded.
await: average time (in ms) to complete an operation. High values (e.g., >50 ms for SSD) indicate a problem.
svctm: average service time per operation. Compare with await. If await >> svctm, the queue is large.

For CPU: %user, %system, %idle.

`df` and `du`: Disk Space

Quick check of free space:

df -h

human-readable (-h) output in gigabytes. Watch %Use. If >90% — clean logs or increase the volume.

To find the largest "space eaters" in a specific folder:

du -sh /var/* | sort -rh | head -10

This shows the 10 largest subfolders in /var.

`ss` and `netstat`: Network Activity

ss is the modern replacement for netstat. Quick view of connections:

ss -tuln

Flags:

-t: TCP,
-u: UDP,
-l: listening,
-n: numeric (no name resolution).

For interface statistics:

ip -s link

Or for detailed network packet stats:

nstat

Advanced Tools for Deep Analysis

`sar`: Historical Data Collection

sar (System Activity Reporter) records metrics every N minutes. Data is stored in /var/log/sysstat/ (filename depends on the distro, e.g., sa14 for day 14).

View today's data:

sar -u  # CPU
sar -r  # Memory
sar -b  # I/O
sar -n DEV  # Network interfaces

Example: sar -u 2 5 — CPU every 2 seconds, 5 times.

Advantage: you can see what happened during a problem, even if you weren't at the terminal.

`nmon`: Interactive Monitoring of All Resources

Install nmon (sudo apt install nmon). Run:

nmon

Keys:

c — CPU,
m — memory,
d — disks,
n — network,
t — top processes,
q — exit.

nmon is useful for a quick overview and session recording (.nmon file), which can later be analyzed in Excel or via nmon2csv.

`glances`: Cross-Platform Monitoring

glances is a Python utility that combines many metrics in one interactive interface. Installation:

pip install glances
# or for the system:
sudo apt install glances

Run: glances. Supports colors, alerts (thresholds), export to JSON, InfluxDB, Elasticsearch.

Graphical and Web Solutions

For long-term monitoring and visualization, use combinations:

Prometheus + Grafana: collect metrics via exporters (node_exporter) and beautiful dashboards.
Netdata: "out-of-the-box" monitoring with a web interface on port 19999. Install: bash <(curl -Ss https://my-netdata.io/kickstart.sh).
Zabbix/Nagios: for enterprise monitoring with alerts.

Practical Scenarios

Scenario 1: High CPU Load

Run top or htop.
Sort by %CPU. Find the process with the highest consumption.
If it's java, python, node — check the application logs.
If it's kworker or migration — the problem might be in the kernel or IRQ.
Use perf top for profiling (install linux-tools).

Scenario 2: Disk Fully Busy

iostat -x 2 — look at %util and await per disk.
iotop (install via sudo apt install iotop) — shows which process is writing/reading.
If await is high but %util is low — the problem might be in the network (NFS, iSCSI).
Check disk queue: cat /proc/diskstats | grep <device>.

Scenario 3: Memory Shortage

free -h — look at available (available) and swap.
If swap is actively used (si/so in vmstat >0) — insufficient RAM.
ps aux --sort=-%mem | head -10 — top 10 by memory.
Check cache: cat /proc/meminfo | grep -E "Cached|Buffers". Large cache is normal; the OS uses free RAM.
If a process is "eating" memory — look for leaks (e.g., via valgrind for C/C++).

Scenario 4: Network Overload

ip -s link — errors (errs) and drops (drop) per interface.
ss -s — summary of sockets (e.g., many TIME-WAIT).
nethogs (install) — shows traffic per process.
iftop — similar to top, but for network.

Automation and Alerting

For regular data collection, set up cron and sar:

# Enable data collection (if not running)
sudo systemctl enable sysstat
sudo systemctl start sysstat

File /etc/default/sysstat (Debian/Ubuntu) or /etc/sysconfig/sysstat (RHEL/CentOS) contains collection parameters (e.g., SA1_OPTIONS="-S XALL" for all metrics).

For alerts, use:

monit — simple daemon that watches processes, disks, CPU.
nagios/zabbix — complex systems with web interfaces.
Bash/Python scripts that check metrics and send notifications (e.g., via mail or Telegram API).

Example script to check CPU load:

#!/bin/bash
LOAD=$(awk '{print $1}' /proc/loadavg)
THRESHOLD=$(nproc)  # number of cores
if (( $(echo "$LOAD > $THRESHOLD" | bc -l) )); then
  echo "High load: $LOAD" | mail -s "Alert: CPU load" admin@example.com
fi

Interpreting Metrics and Prevention

Key Indicators

CPU: %idle < 20% — overload. But for web servers, 70-80% idle is normal if there's no queue.
Memory: available < 10% of total — alarm. Watch swap — if active, it's a sign of insufficient RAM.
Disk: await > 20 ms for SSD, > 10 ms for HDD — problem. %util > 80% — disk can't cope.
Network: rising drop/errs — overload or driver error.

Prevention

Regularly check logs (/var/log/syslog, dmesg).
Set up monitoring with thresholds (e.g., CPU > 90% for 5 minutes).
Limit processes via cgroups (systemd slice, docker limits).
Update kernel and drivers — sometimes problems are fixed in new versions.
For I/O-intensive tasks, use ionice and nice.

Common Beginner Mistakes

Looking only at top without considering wa — miss I/O problems.
Treating free in free -m as "free memory" — ignoring cache. Better use available.
Ignoring si/so in vmstat — swap kills performance.
Not setting up alerts — they find out about the problem when the server has already crashed.

Conclusion

Monitoring is a continuous process. Start with basic utilities (top, vmstat, iostat), then add sar for history and glances/nmon for a comprehensive overview. For production environments, definitely set up graphical dashboards (Grafana) and alerts.

Remember: metrics without context are useless. Know your workload: requests per second, data volume, peak hours. Then anomalies will be visible immediately.

F.A.Q.

Which command displays real-time CPU usage?

How to check disk space usage?

What to do if a process consumes too much memory?

Can I monitor a remote server without installing additional software?

Hints

Install necessary utilities

Use `top` for a quick overview

Analyze overall statistics with `vmstat`

Check disk activity with `iostat`

Monitor the network with `ss` and `netstat`

Collect historical data with `sar`