smartctl Basics: How to Monitor Disk Health in Linux

Introduction

The smartctl utility is part of the smartmontools package and provides an interface to access SMART (Self-Monitoring, Analysis and Reporting Technology) data on hard disk drives (HDD) and solid-state drives (SSD). Monitoring disk health allows for early detection of failure signs, preventing data loss. In this guide, you will learn how to install smartctl, perform basic checks, and interpret results to maintain your system's reliability.

Requirements / Preparation

Before you begin, ensure you have:

Access to a Linux terminal with superuser privileges (sudo).
An active internet connection to install packages (if smartmontools is not installed).
Basic knowledge of Linux commands (lsblk, sudo).

Note: The smartctl utility works with disks that support SMART technology. Most modern HDDs and SSDs have this support.

Step 1: Installing smartmontools

If the smartmontools package is not already installed on your system, install it via your package manager.

For Debian/Ubuntu-based distributions:

sudo apt update
sudo apt install smartmontools

For CentOS/RHEL 7:

sudo yum install smartmontools

For CentOS 8+, Fedora, RHEL 8+:

sudo dnf install smartmontools

After installation, verify the command is available:

smartctl --version

Step 2: Identifying Disk Devices

To work with smartctl, you need to know the path to the disk device (e.g., /dev/sda). Use the lsblk command to list all block devices:

lsblk

Example output:

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 238,5G  0 disk 
├─sda1   8:1    0   512M  0 part /boot/efi
├─sda2   8:2    0   128G  0 part /
└─sda3   8:3    0   110G  0 part /home
sdb      8:16   0   1,8T  0 disk 
└─sdb1   8:17   0   1,8T  0 part /data

In this case, the primary disks are /dev/sda and /dev/sdb. Select the disk you wish to check.

Step 3: Running a Short SMART Test

A short test quickly checks the disk's general health and typically takes 1-2 minutes. Use the -H option to check health:

smartctl -H /dev/sdX

Replace /dev/sdX with your disk (e.g., /dev/sda).

Example output:

smartctl 7.2 2021-10-10 r5145 [x86_64-linux-5.4.0-91-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

If the result is PASSED, the disk is in good condition. If FAILED, further examination is required.

Step 4: Viewing Detailed SMART Information

For a detailed analysis, use the -a option:

smartctl -a /dev/sdX

This will output all SMART attributes, error logs, and disk information. Pay attention to these sections:

SMART overall-health self-assessment test result: the overall verdict.
SMART Attributes: the attribute table. Critical attributes include:
- Reallocated_Sector_Ct: count of reallocated sectors. Non-zero values indicate wear.
- Current_Pending_Sector: sectors waiting to be reallocated. A high value is a sign of problems.
- UDMA_CRC_Error_Count: cable errors (for PATA/SATA). Indicates connection issues.
- Temperature_Celsius: disk temperature.
SMART Error Log: the error log.

Example attribute output:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       12345
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       35

Here, the RAW_VALUE for Reallocated_Sector_Ct is 0, which is good.

Step 5: Running an Extended SMART Test

For a thorough check, perform a long test. This test scans the entire disk surface and can take anywhere from several minutes to several hours, depending on disk size.

smartctl -t long /dev/sdX

After starting, you will see a message indicating the test has begun. To check progress or results, use:

smartctl -a /dev/sdX | grep -A1 "Self-test execution status"

Or wait for completion and then run smartctl -a again to view results in the SMART Self-test log section.

⚠️ Important: The long test can take from 30 minutes to several hours. Do not interrupt it unnecessarily.

Step 6: Analyzing Results and Interpretation

After running tests (short or long), analyze the output:

Overall health status: look for the line SMART overall-health self-assessment test result. If PASSED, the disk is healthy. If FAILED, check attributes and logs immediately.
Check critical attributes from the table. Any attribute with type Pre-fail and a RAW_VALUE above the threshold (THRESH) or a VALUE that has fallen below THRESH requires attention.
Review the SMART Self-test log for details of completed tests. Errors in tests indicate problems.

If issues are detected, it is recommended to:

Immediately back up your data.
Plan to replace the disk if errors are increasing.
Check cables and connections (especially for UDMA_CRC_Error_Count).

Verification

After completing this guide, you should:

See PASSED in the overall disk health after a short test.
Have an understanding of your disk's status via SMART attributes.
If needed, have successfully run and completed a long test without errors.

To confirm, re-run smartctl -H /dev/sdX and ensure the status is PASSED. Also, verify that no warning values appear in the attributes.

Potential Issues

"Permission denied" or "Unable to open device" Error

Cause: Insufficient permissions to access the device.
Solution: Use sudo before the smartctl command. For example: sudo smartctl -a /dev/sda.

"SMART support is: Unavailable" or "Unable to detect SMART" Message

Cause: The disk does not support SMART, or the driver does not provide access (e.g., for some RAID controllers or external disks).
Solution: Ensure the disk is connected directly to a controller (not via RAID). For RAID arrays, use the utilities provided by the built-in controller. Check the disk's documentation to confirm SMART support.

Long Test Fails to Start or Completes with an Error

Cause: The disk is busy (e.g., mounted and actively in use), or there is a hardware problem.
Solution: Try running the test in single-user mode or with the disk unmounted (if possible). For a system disk, you may need to boot from a live USB. If the test consistently fails, the disk is likely faulty.

Low Attribute Values (e.g., Reallocated_Sector_Ct > 0)

Cause: Disks accumulate bad sectors over time and reallocate them.
Solution: Monitor the trend. If the value is increasing, plan to replace the disk. Backing up data is essential.

smartctl Not Found After Installation

Cause: The package is installed, but the path is not in PATH, or the installation failed.
Solution: Reinstall the package or check that /usr/sbin is in your PATH. Typically, smartctl is located at /usr/sbin/smartctl.

F.A.Q.

What is SMART and why is it needed?

How to install smartctl on Linux?

How to interpret SMART test results?

Should SMART tests be run regularly?

Hints

Install smartmontools

Identify disk devices

Run a short SMART test

View detailed SMART information

Run an extended test

Analyze and interpret results