SSD Health Check in Linux: Complete SMART Guide

Introduction / Why This Is Needed

Checking the health of an SSD drive is a critically important procedure to prevent sudden data loss. Unlike HDDs, SSDs do not emit characteristic sounds when failing, so the only way to assess their condition is by analyzing the built-in SMART (Self-Monitoring, Analysis and Reporting Technology).

In this guide, you will learn:

How to install and use the smartctl utility for SSD diagnostics.
How to read and interpret key SMART attributes specific to SSDs.
How to determine the remaining drive lifespan (TBW — Total Bytes Written).
How to identify early signs of wear and predict drive failure.

The procedure will take 10-15 minutes and does not require special knowledge, but will require superuser (sudo) privileges.

Requirements / Preparation

Before starting, ensure that:

You have access to a Linux terminal with sudo privileges.
The smartmontools package is installed (instructions below).
You know the name of the SSD drive you want to check (e.g., /dev/nvme0n1 or /dev/sda).

💡 Tip: If you have multiple drives, identify which one is an SSD by looking for nvme in the lsblk output (for NVMe SSDs) or by size/model. For SATA SSDs, the name is usually /dev/sdX.

Step 1: Installing smartmontools

The smartctl utility is part of the smartmontools package. Install it:

For Debian/Ubuntu and derivatives:

sudo apt update
sudo apt install smartmontools

For RHEL/CentOS/Fedora:

sudo yum install smartmontools  # CentOS 7
sudo dnf install smartmontools  # CentOS 8+, Fedora

For Arch Linux:

sudo pacman -S smartmontools

Verify the installation:

smartctl --version

Step 2: Identifying the SSD Drive

Run the command to view all block devices:

lsblk

Example output:

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 238,5G  0 disk 
├─sda1        8:1    0   512M  0 part /boot/efi
├─sda2        8:2    0   128G  0 part /
└─sda3        8:3    0   110G  0 part /home
nvme0n1     259:0    0 476,9G  0 disk 
├─nvme0n1p1 259:1    0   512M  0 part /boot
└─nvme0n1p2 259:2    0 476,4G  0 part /

Here, nvme0n1 is an NVMe SSD, and sda is possibly a SATA SSD or HDD. Use the corresponding device name (e.g., /dev/nvme0n1) to check the SSD's health.

Step 3: Viewing General Health Information

Perform a quick check of the overall status:

sudo smartctl -H /dev/your_device

Example output for a healthy drive:

smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.15.0-67-generic] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

If you see FAILED — immediately back up your data and plan to replace the drive.

Step 4: Analyzing the Full SMART Dump

For a detailed analysis, run:

sudo smartctl -a /dev/your_device

This will output all SMART attributes. For SSDs, pay special attention to the following:

Key Attributes for SSDs (SATA):

ID 5: Reallocated_Sector_Ct — number of reallocated sectors. Non-zero values indicate wear.
ID 9: Power_On_Hours — operating time in hours.
ID 12: Power_Cycle_Count — number of power cycles.
ID 173: Media_Wearout_Indicator (or Wear_Leveling_Count) — NAND wear percentage. 100 = new, 0 = limit. The failure threshold is usually 10-20%.
ID 177: Wear_Leveling_Count (alternative to 173) — similar wear indicator.
ID 179: Used_Rsvd_Blk_Cnt_Tot (Samsung) — number of used reserve blocks.
ID 181: Program_Fail_Cnt_Total — cell programming failures.
ID 182: Erase_Fail_Count_Total — cell erase failures.
ID 183: Runtime_Bad_Block — bad blocks detected during operation.
ID 187: Uncorrectable_Error_Cnt — uncorrectable errors.
ID 188: Command_Timeout — command timeouts.
ID 190: Airflow_Temperature_Cel — temperature (not relevant for some SSDs).
ID 194: Temperature_Celsius — sensor temperature.
ID 195: Hardware_ECC_Recovered — corrected ECC errors.
ID 196: Reallocated_Event_Count — reallocation events.
ID 197: Current_Pending_Sector — sectors pending reallocation (critical!).
ID 198: Offline_Uncorrectable — uncorrectable sectors during offline test.
ID 199: UDMA_CRC_Error_Count — cable errors (for SATA).

For NVMe SSDs (use `-a` with `-d nvme` or simply `smartctl -a`):

SMART/Health Information: Critical Warning, Temperature, Available Spare, Available Spare Threshold.
Available Spare (spare reserve) — percentage of reserve blocks. Below Threshold means the drive is near wear-out.
Data Units Written (or Total Bytes Written TBW) — total amount of data written.
Media and Data Integrity Errors — data integrity errors.
Number of Critical Warning — number of critical warnings.

How to Read Values:

For each attribute, look at three columns:

VALUE — current normalized value (usually 100 = perfect, decreases with wear).
THRESH — threshold value. If VALUE ≤ THRESH, the attribute is in a critical state.
WORST — worst value recorded during operation.

Example of a problematic attribute:

ID 173 Used_Rsvd_Blk_Cnt_Tot 0x0032 100 100 0 - 0

Here VALUE=100, THRESH=0 (no threshold) — everything is fine.

But if:

ID 5 Reallocated_Sector_Ct 0x0033 090 090 000 - 10

VALUE=90, THRESH=10 — still within normal range, but monitor for changes.

⚠️ Important: Some attributes (especially Media_Wearout_Indicator) may increase over time (e.g., from 100 to 99), which is normal. It becomes critical when VALUE drops below THRESH.

Step 5: Running an Extended Test (Optional)

For a deep check, run a long test (may take 1-10 hours depending on the drive):

sudo smartctl -t long /dev/your_device

After completion (check progress with sudo smartctl -c /dev/device), run sudo smartctl -a /dev/device again and review the test results in the SMART Self-test log section.

::in-article-ad

Checking the Result

Successful Result:

smartctl -H output shows PASSED.
No attributes where VALUE ≤ THRESH.
For NVMe: Available Spare > Available Spare Threshold.
Media Wearout Indicator (or similar) > 20 (the higher, the better).

Critical Result:

SMART overall-health self-assessment test result: FAILED.
Any attribute with VALUE ≤ THRESH (especially Reallocated_Sector_Ct, Current_Pending_Sector, Offline_Uncorrectable).
For NVMe: Available Spare ≤ Available Spare Threshold.
Sharp drop in Media Wearout Indicator (e.g., from 90 to 50 within a month).

Actions for a critical result:

Immediately create a full data backup to another storage medium.
Replace the SSD. Do not continue using a drive with FAILED — failure can occur at any moment.

Possible Issues

Issue 1: `SMART command failed: scsi error unsolicited sense data`

Cause: The disk is busy (e.g., mounted and actively used) or does not support SMART in the current mode. Solution: Ensure the disk is not in use, or try adding the -d ata flag for SATA: sudo smartctl -a -d ata /dev/device.

Issue 2: `Unable to detect device type`

Cause: Incorrect device name or driver does not support SMART. Solution: Check the device name via lsblk. For NVMe SSDs, you may need to explicitly specify the type: sudo smartctl -a -d nvme /dev/nvme0.

Issue 3: `SMART support is: Unavailable`

Cause: The disk does not support SMART (rare for SSDs) or SMART is disabled in BIOS/UEFI. Solution: Check BIOS/UEFI settings and enable options like SATA Mode (AHCI) and SMART. For some virtual machines, SMART is unavailable.

Issue 4: `Read SMART Data Failed: Input/output error`

Cause: Hardware disk error or cable issue (for SATA). Solution: Check the SATA cable connection (if applicable), try a different port. If the error persists — the disk is faulty.

Issue 5: `Unknown USB bridge` or `ATA device is not ready`

Cause: The SSD is connected via a USB adapter that does not pass through SMART commands. Solution: Connect the SSD directly to the motherboard (SATA/NVMe). Many USB adapters do not support SMART.

Issue 6: Missing `Media Wearout Indicator` attribute

Cause: The smartctl utility did not recognize it, or the manufacturer uses a different ID. Solution: Look for attributes named Wear_Leveling_Count, Total_LBAs_Written, Data_Units_Written, or Percentage_Used. Use sudo smartctl -a /dev/device | grep -i wear to search.

Issue 7: Low temperature (e.g., 20°C) — is this normal?

Cause: Some SSDs lack an accurate temperature sensor or show an averaged value. Solution: SSD temperature is usually not critical (operates up to 70-80°C). Focus on wear attributes and errors.

Issue 8: How to estimate TBW (Total Bytes Written) from SMART?

Solution: Find the Data Units Written attribute (NVMe) or Total_LBAs_Written (SATA). Multiply the value by the sector size (usually 512 bytes or 1000 bytes). Example for NVMe:

Data Units Written: 123,456,789 [63.1 TB]

The value is already given in TB. For SATA:

Total_LBAs_Written: 1234567890

Then TBW = 1234567890 * 512 / 1000^4 ≈ 0.56 TB (if sector is 512 bytes).

Issue 9: `smartctl` does not see the disk in the list

Solution: Ensure the disk is properly connected and detected by the system (lsblk). For RAID arrays or hardware that hides disks (e.g., some enterprise SSDs via HW RAID), SMART may be unavailable. Use the utilities provided by your RAID controller manufacturer.

Issue 10: How to monitor SSDs automatically?

Solution: Set up a regular smartctl -H run via cron (e.g., once daily) and send email reports on FAILED. Example crontab line:

0 2 * * * /usr/sbin/smartctl -H /dev/nvme0 >> /var/log/smart.log 2>&1

Alternatively, use daemons like smartd from smartmontools.

Conclusion

Regular SSD health checks via SMART are a simple and effective way to avoid sudden data loss. The process takes minutes but provides valuable information about the drive's remaining lifespan. Start by installing smartmontools and running smartctl -a for your drives. Remember: at any sign of FAILED or reallocated sectors — immediately back up your data and plan to replace the drive.

Important: SMART is not a guarantee, but it is the best available tool. Even a drive showing PASSED can fail suddenly, so always keep an up-to-date backup of important data.

F.A.Q.

What is SMART and why is it needed for SSDs?

Which command shows SSD health in Linux?

How to interpret the 'Media Wearout Indicator' attribute?

What to do if SMART shows 'FAILED'?

Hints

Install smartmontools

Identify the SSD device

View general health information

Analyze full SMART dump

Run extended test (optional)