Introduction / Why This Is Needed
Checking the health of an SSD drive is a critically important procedure to prevent sudden data loss. Unlike HDDs, SSDs do not emit characteristic sounds when failing, so the only way to assess their condition is by analyzing the built-in SMART (Self-Monitoring, Analysis and Reporting Technology).
In this guide, you will learn:
- How to install and use the
smartctlutility for SSD diagnostics. - How to read and interpret key SMART attributes specific to SSDs.
- How to determine the remaining drive lifespan (TBW — Total Bytes Written).
- How to identify early signs of wear and predict drive failure.
The procedure will take 10-15 minutes and does not require special knowledge, but will require superuser (sudo) privileges.
Requirements / Preparation
Before starting, ensure that:
- You have access to a Linux terminal with
sudoprivileges. - The
smartmontoolspackage is installed (instructions below). - You know the name of the SSD drive you want to check (e.g.,
/dev/nvme0n1or/dev/sda).
💡 Tip: If you have multiple drives, identify which one is an SSD by looking for
nvmein thelsblkoutput (for NVMe SSDs) or by size/model. For SATA SSDs, the name is usually/dev/sdX.
Step 1: Installing smartmontools
The smartctl utility is part of the smartmontools package. Install it:
For Debian/Ubuntu and derivatives:
sudo apt update
sudo apt install smartmontools
For RHEL/CentOS/Fedora:
sudo yum install smartmontools # CentOS 7
sudo dnf install smartmontools # CentOS 8+, Fedora
For Arch Linux:
sudo pacman -S smartmontools
Verify the installation:
smartctl --version
Step 2: Identifying the SSD Drive
Run the command to view all block devices:
lsblk
Example output:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 238,5G 0 disk
├─sda1 8:1 0 512M 0 part /boot/efi
├─sda2 8:2 0 128G 0 part /
└─sda3 8:3 0 110G 0 part /home
nvme0n1 259:0 0 476,9G 0 disk
├─nvme0n1p1 259:1 0 512M 0 part /boot
└─nvme0n1p2 259:2 0 476,4G 0 part /
Here, nvme0n1 is an NVMe SSD, and sda is possibly a SATA SSD or HDD. Use the corresponding device name (e.g., /dev/nvme0n1) to check the SSD's health.
Step 3: Viewing General Health Information
Perform a quick check of the overall status:
sudo smartctl -H /dev/your_device
Example output for a healthy drive:
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-5.15.0-67-generic] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
If you see FAILED — immediately back up your data and plan to replace the drive.
Step 4: Analyzing the Full SMART Dump
For a detailed analysis, run:
sudo smartctl -a /dev/your_device
This will output all SMART attributes. For SSDs, pay special attention to the following:
Key Attributes for SSDs (SATA):
- ID 5:
Reallocated_Sector_Ct— number of reallocated sectors. Non-zero values indicate wear. - ID 9:
Power_On_Hours— operating time in hours. - ID 12:
Power_Cycle_Count— number of power cycles. - ID 173:
Media_Wearout_Indicator(orWear_Leveling_Count) — NAND wear percentage. 100 = new, 0 = limit. The failure threshold is usually 10-20%. - ID 177:
Wear_Leveling_Count(alternative to 173) — similar wear indicator. - ID 179:
Used_Rsvd_Blk_Cnt_Tot(Samsung) — number of used reserve blocks. - ID 181:
Program_Fail_Cnt_Total— cell programming failures. - ID 182:
Erase_Fail_Count_Total— cell erase failures. - ID 183:
Runtime_Bad_Block— bad blocks detected during operation. - ID 187:
Uncorrectable_Error_Cnt— uncorrectable errors. - ID 188:
Command_Timeout— command timeouts. - ID 190:
Airflow_Temperature_Cel— temperature (not relevant for some SSDs). - ID 194:
Temperature_Celsius— sensor temperature. - ID 195:
Hardware_ECC_Recovered— corrected ECC errors. - ID 196:
Reallocated_Event_Count— reallocation events. - ID 197:
Current_Pending_Sector— sectors pending reallocation (critical!). - ID 198:
Offline_Uncorrectable— uncorrectable sectors during offline test. - ID 199:
UDMA_CRC_Error_Count— cable errors (for SATA).
For NVMe SSDs (use -a with -d nvme or simply smartctl -a):
- SMART/Health Information:
Critical Warning,Temperature,Available Spare,Available Spare Threshold. Available Spare(spare reserve) — percentage of reserve blocks. BelowThresholdmeans the drive is near wear-out.Data Units Written(orTotal Bytes WrittenTBW) — total amount of data written.Media and Data Integrity Errors— data integrity errors.Number of Critical Warning— number of critical warnings.
How to Read Values:
For each attribute, look at three columns:
VALUE— current normalized value (usually 100 = perfect, decreases with wear).THRESH— threshold value. IfVALUE≤THRESH, the attribute is in a critical state.WORST— worst value recorded during operation.
Example of a problematic attribute:
ID 173 Used_Rsvd_Blk_Cnt_Tot 0x0032 100 100 0 - 0
Here VALUE=100, THRESH=0 (no threshold) — everything is fine.
But if:
ID 5 Reallocated_Sector_Ct 0x0033 090 090 000 - 10
VALUE=90, THRESH=10 — still within normal range, but monitor for changes.
⚠️ Important: Some attributes (especially
Media_Wearout_Indicator) may increase over time (e.g., from 100 to 99), which is normal. It becomes critical whenVALUEdrops belowTHRESH.
Step 5: Running an Extended Test (Optional)
For a deep check, run a long test (may take 1-10 hours depending on the drive):
sudo smartctl -t long /dev/your_device
After completion (check progress with sudo smartctl -c /dev/device), run sudo smartctl -a /dev/device again and review the test results in the SMART Self-test log section.
::in-article-ad
::
Checking the Result
Successful Result:
smartctl -Houtput showsPASSED.- No attributes where
VALUE≤THRESH. - For NVMe:
Available Spare>Available Spare Threshold. Media Wearout Indicator(or similar) > 20 (the higher, the better).
Critical Result:
SMART overall-health self-assessment test result: FAILED.- Any attribute with
VALUE≤THRESH(especiallyReallocated_Sector_Ct,Current_Pending_Sector,Offline_Uncorrectable). - For NVMe:
Available Spare≤Available Spare Threshold. - Sharp drop in
Media Wearout Indicator(e.g., from 90 to 50 within a month).
Actions for a critical result:
- Immediately create a full data backup to another storage medium.
- Replace the SSD. Do not continue using a drive with
FAILED— failure can occur at any moment.
Possible Issues
Issue 1: SMART command failed: scsi error unsolicited sense data
Cause: The disk is busy (e.g., mounted and actively used) or does not support SMART in the current mode.
Solution: Ensure the disk is not in use, or try adding the -d ata flag for SATA: sudo smartctl -a -d ata /dev/device.
Issue 2: Unable to detect device type
Cause: Incorrect device name or driver does not support SMART.
Solution: Check the device name via lsblk. For NVMe SSDs, you may need to explicitly specify the type: sudo smartctl -a -d nvme /dev/nvme0.
Issue 3: SMART support is: Unavailable
Cause: The disk does not support SMART (rare for SSDs) or SMART is disabled in BIOS/UEFI.
Solution: Check BIOS/UEFI settings and enable options like SATA Mode (AHCI) and SMART. For some virtual machines, SMART is unavailable.
Issue 4: Read SMART Data Failed: Input/output error
Cause: Hardware disk error or cable issue (for SATA). Solution: Check the SATA cable connection (if applicable), try a different port. If the error persists — the disk is faulty.
Issue 5: Unknown USB bridge or ATA device is not ready
Cause: The SSD is connected via a USB adapter that does not pass through SMART commands. Solution: Connect the SSD directly to the motherboard (SATA/NVMe). Many USB adapters do not support SMART.
Issue 6: Missing Media Wearout Indicator attribute
Cause: The smartctl utility did not recognize it, or the manufacturer uses a different ID.
Solution: Look for attributes named Wear_Leveling_Count, Total_LBAs_Written, Data_Units_Written, or Percentage_Used. Use sudo smartctl -a /dev/device | grep -i wear to search.
Issue 7: Low temperature (e.g., 20°C) — is this normal?
Cause: Some SSDs lack an accurate temperature sensor or show an averaged value. Solution: SSD temperature is usually not critical (operates up to 70-80°C). Focus on wear attributes and errors.
Issue 8: How to estimate TBW (Total Bytes Written) from SMART?
Solution: Find the Data Units Written attribute (NVMe) or Total_LBAs_Written (SATA). Multiply the value by the sector size (usually 512 bytes or 1000 bytes). Example for NVMe:
Data Units Written: 123,456,789 [63.1 TB]
The value is already given in TB. For SATA:
Total_LBAs_Written: 1234567890
Then TBW = 1234567890 * 512 / 1000^4 ≈ 0.56 TB (if sector is 512 bytes).
Issue 9: smartctl does not see the disk in the list
Solution: Ensure the disk is properly connected and detected by the system (lsblk). For RAID arrays or hardware that hides disks (e.g., some enterprise SSDs via HW RAID), SMART may be unavailable. Use the utilities provided by your RAID controller manufacturer.
Issue 10: How to monitor SSDs automatically?
Solution: Set up a regular smartctl -H run via cron (e.g., once daily) and send email reports on FAILED. Example crontab line:
0 2 * * * /usr/sbin/smartctl -H /dev/nvme0 >> /var/log/smart.log 2>&1
Alternatively, use daemons like smartd from smartmontools.
Conclusion
Regular SSD health checks via SMART are a simple and effective way to avoid sudden data loss. The process takes minutes but provides valuable information about the drive's remaining lifespan. Start by installing smartmontools and running smartctl -a for your drives. Remember: at any sign of FAILED or reallocated sectors — immediately back up your data and plan to replace the drive.
Important: SMART is not a guarantee, but it is the best available tool. Even a drive showing PASSED can fail suddenly, so always keep an up-to-date backup of important data.