Introduction / Why This Matters
Linux performance monitoring isn't about complex scripts; it's about quickly understanding what exactly is slowing down the system. Without this, any "the server is slow" is just guesswork. You'll learn how to pinpoint in 60 seconds whether the CPU, memory, disk, or network is the source of the problem. This is a fundamental skill for administering any Linux server, from a home setup to production.
Requirements / Preparation
Before you begin, ensure:
- You have SSH access to the server with sudo privileges (some commands require root).
- Basic utilities are installed. We'll start by installing
sysstatandhtop(Step 1). - You are in text mode (without a graphical shell) for a clean test. If you're using GNOME/KDE, some utilities (like htop) will work, but
iostatandvmstatshould be run in a terminal.
Step 1: Install Basic Monitoring Utilities
In most minimal Linux installations (especially in containers or on servers), convenient tools like htop are not available. The standard set (top, free, df) only provides a general overview. We need detailed data.
# For Ubuntu/Debian
sudo apt update
sudo apt install sysstat htop iftop iotop -y
# For RHEL/CentOS/AlmaLinux
sudo yum install sysstat htop iftop iotop -y
# For Fedora
sudo dnf install sysstat htop iftop iotop -y
What we're installing:
sysstat— a suite includingiostat(disks),mpstat(CPU cores),sar(history).htop— an improved interactive process viewer.iftop— network traffic monitoring by connection.iotop— I/O activity monitoring per process (requires root).
Step 2: Assess Overall CPU and Process Load
The first thing to understand is not 'if it's slow' but what exactly is loading the system. Run:
htop
In htop, focus on the top section:
- CPU bars (CPU1, CPU2...) — show the load on each core. If all are red, the CPU is busy.
- Load average (1, 5, 15 min averages) — the average length of the process queue. Rule of thumb: the load average should not significantly exceed the number of CPU cores. For example, on a 4-core server, values of 4.0, 3.5, 2.0 are normal. 10.0, 8.0, 6.0 indicate critical overload.
- Process list — sort by
%CPU(pressF6->PERCENT_CPU). A process consistently 'hogging' 80-100% of one core is the likely culprit.
If htop is unavailable, use top:
top
Press 1 to show each core's load. Exit with q.
Step 3: Analyze Memory and Swap Usage
Even if the CPU is free, the system can 'slow down' due to insufficient RAM and active swap usage.
In htop, check the Mem and Swp lines:
Mem:shows total, used, buffers/cache.Swp:if there are non-zero values (especially 'used'), the system is actively using disk as memory — this is very slow.
For precise numbers:
free -h
Key columns:
used— how much memory is occupied.available— the most important estimate of memory available for new processes without swapping.swap used— if greater than 0 and growing — problem.
Symptom: The application runs but response is 'laggy'. Cause: constant swapping.
Step 4: Check Disk Subsystem Load
Disks (especially HDDs or overloaded SSDs) are a common bottleneck. Use iostat:
iostat -x 1
Key columns in the output (-x for extended):
%util— percentage of time the device was busy processing requests. Target: < 70-80%. 100% means the disk is fully loaded.await— average time (in milliseconds) to complete I/O operations. Target: for SSD < 1-5 ms, for HDD < 20-50 ms. High values (100+ ms) indicate a problem.svctm— average service time (usually less useful thanawait).
Example output:
Device r/s w/s rkB/s wkB/s await svctm %util
sda 0.00 150.00 0.00 6144.00 5.20 1.20 18.00
nvme0n1 5.00 200.00 1024.00 40960.00 12.50 0.80 16.40
Here sda (likely an HDD) has an await of 5.2 ms — normal. But if await were 100 ms with %util at 90% — the disk is overwhelmed.
Tip: If iostat doesn't show the desired devices, specify them explicitly: iostat -x 1 /dev/sda /dev/nvme0n1.
Step 5: Examine Network Activity and Errors
The network can 'fail' due to channel overload, interface errors, or application issues.
Quick real-time traffic view:
sudo iftop -nP
-n— don't resolve IPs to names (faster).-P— show ports. Sorting bySENTorRECV(presssorr) will show which connections are loading the channel.
More detailed interface statistics:
ip -s link show eth0 # or ens3, enp0s3, etc.
Look for in the output:
rx errors/tx errors— number of receive/transmit errors. Non-zero values require checking the cable, switch, driver.rx dropped/tx dropped— packets dropped by the kernel due to lack of resources (buffers). Growth in these values under high load indicates congestion.
Step 6: Collect Historical Data for In-Depth Analysis
If the issue is periodic (e.g., 'slows down every day at 14:00'), you need to look at history. The sysstat daemon handles this.
- Check if it's running:
sudo systemctl status sysstat
Ifactive (running)— data is already being collected. Ifinactiveorfailed— enable it:sudo systemctl enable --now sysstat - Viewing archives:
Data is stored in
/var/log/sysstat/in binary format. Read it with thesarcommand.- CPU for yesterday:
sudo sar -u -f /var/log/sysstat/sa$(date -d yesterday +%d) - Disks for the last 10 minutes (default collection interval is 10 min):
sudo sar -d -f /var/log/sysstat/sa$(date +%d) | grep -E "Device|Average" - Memory:
sudo sar -r -f /var/log/sysstat/sa$(date +%d) | grep -E "kbmemfree|kbmemused|%memused"
To adjust the collection interval (e.g., every minute), edit/etc/default/sysstat(Debian/Ubuntu) or/etc/sysconfig/sysstat(RHEL) and change theSA1_OPTIONSparameter. - CPU for yesterday:
Verification
After completing the steps, you should:
- Identify the resource bottleneck: CPU (
%utilnear 100%, high load average), Memory (availablelow, swap active), Disk (highawaitand%util), Network (errors/dropped, 100% utilization). - Find the 'culprit': specific process (
htop), operation type (iotop— many writes?), specific network connection (iftop). - Obtain data for further action: e.g., 'Process
javawith PID 1234 consumes 300% CPU' or 'Disk/dev/nvme0n1hasawaitof 150 ms at%util95%'.
If the issue is localized at the application level (e.g., a specific Java process), further diagnosis will depend on it (log analysis, profiling).
Potential Issues
iostat: command not found— thesysstatpackage is not installed. See Step 1.Permission deniedwhen runningiostatorsar— some commands require root. Usesudoor log in as root.- Zero values in
iostat— the disk might not be in use or the system uses virtual block devices (in containers). Checklsblkanddf -h. iftopdoesn't show the interface — specify it explicitly:sudo iftop -i eth0.- No
sardata for past days — thesysstatdaemon wasn't running earlier. Data is collected only from the time the service was started. - High
awaitwith low%util— may indicate issues with the disk controller, driver, or hardware failures. Checkdmesg | grep -i error. top/htopshows 100% CPU but no process with high%CPU— this could be system interrupts (si) or processes inDstate (uninterruptible sleep, usually waiting for I/O). Inhtop, pressF2->Display options-> enableShow custom thread namesandDetailedfor viewing. For I/O-bound processes, useiotop.