High Disk Usage in Linux (Disk Saturation)
What is Disk Saturation?
Disk Saturation is a state in which the storage device cannot cope with the volume of incoming read and write requests. Unlike simple high load, saturation means that the disk is operating at its limits, and any additional operations cause delays in the queue.
Saturation indicators:
- Up to 50% — normal operation
- 50-80% — increased load
- 80-100% — critical saturation
Symptoms of the Problem
When disk saturation occurs in Linux, the following signs can be observed:
- System slowdown — applications take a long time to start and respond to user actions
- High iowait — the CPU spends a lot of time waiting for I/O operations
- Hanging file operations — copying, writing, and reading files occur with delays
- Logging issues — entries in system logs may lag behind
Diagnosing Disk Saturation
1. Using iostat
Install the sysstat package and run:
iostat -x 1
Pay attention to the following metrics:
| Parameter | Description | Critical Value |
|---|---|---|
| %util | Percentage of CPU time spent processing I/O requests | > 80% |
| r/s, w/s | Number of read/write operations per second | Depends on the type of disk |
| r_await, w_await | Average wait time for read/write operations | > 20 ms |
2. Checking iowait via top
top
In the top line, the wa (iowait) parameter is displayed:
%Cpu(s): 5.2 us, 2.1 sy, 0.0 ni, 72.3 id, 18.4 wa, 0.0 hi, 2.0 si, 0.0 st
If wa exceeds 20-30%, it indicates disk problems.
3. Monitoring Processes with iotop
sudo iotop
This command shows processes actively using the disk:
Total DISK READ: 0.00 B/s | Total DISK WRITE: 150.23 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
1247 be/4 root 0.00 B/s 150.23 M/s 0.00 % 0.00 % dd if=/dev/zero of=/test
4. Analyzing with vmstat
vmstat 1
Key columns:
- si (swap in) — data loaded from swap
- so (swap out) — data unloaded to swap
- wa — iowait
Typical Causes of Saturation
1. Intensive Log Writing
# Checking the size of the log directory
du -sh /var/log/*
# Monitoring write activity
sudo lsof | grep -E "REG|/var/log"
2. Working with Large Databases
PostgreSQL, MySQL, and MongoDB databases actively use the disk when:
- Executing heavy queries
- Performing VACUUM/indexing operations
- Writing transaction logs
3. Insufficient RAM
When RAM is lacking, the system actively uses swap:
# Checking swap usage
swapon --show
free -h
4. Filesystem Fragmentation
# Checking fragmentation (for ext4)
sudo fsck -n /dev/sda1
Ways to Resolve the Problem
Short-term Solutions
- Stopping Resource-Intensive Processes
# Find processes with high I/O ps aux | awk '$8 ~ /D/ {print $0}' # D — uninterruptible sleep state (usually waiting for I/O) - Cleaning Temporary Files
# Cleaning package cache sudo apt clean # Cleaning old logs sudo journalctl --vacuum-time=7d - Clearing Swap (if there is a lot of free memory)
sudo swapoff -a && sudo swapon -a
Long-term Solutions
- Switching to SSD
Replacing HDD with SSD significantly improves I/O performance:# Checking disk type lsblk -d -o NAME,TYPE,ROTA # ROTA=1 — HDD, ROTA=0 — SSD - Configuring I/O Scheduler
For SSDs, it is recommended to usenoneormq-deadline:# Viewing the current scheduler cat /sys/block/sda/queue/scheduler # Setting the scheduler echo "mq-deadline" | sudo tee /sys/block/sda/queue/scheduler - Optimizing Applications
- Configure buffering and caching in applications
- Use asynchronous write operations
- Stagger read and write operations over time
- Monitoring and Alerting
Set up a monitoring system (Prometheus, Zabbix, Nagios) to track:- Disk usage percentage (%util)
- I/O wait time (await)
- iowait value
Prevention
- Regularly clean logs and temporary files
- Monitor free disk space
- Set up log rotation
- Use monitoring for early problem detection
- Plan storage capacity with a buffer of 20-30%
Conclusion
Disk saturation is a serious issue that affects the performance of the entire system. Timely diagnostics using iostat, iotop, and vmstat allow for quick identification of the cause of high load. A comprehensive approach, including both immediate measures and long-term optimization (switching to SSD, configuring the scheduler, optimizing applications), will help resolve the problem and prevent its recurrence.