Re: SMART Errors: What does it mean?

Linux for blind general discussion <blinux-list@xxxxxxxxxx> · Fri, 5 Apr 2019 08:23:25 -0500

Tim here.  It sounds like your drive is starting to error out and
possibly die.

The first thing I'd do is run a background test:

$ sudo /usr/sbin/smartctl --test=long /dev/sda

and then after it has finished running, issue

$ sudo /usr/sbin/smartctl -l selftest /dev/sda

to see the results of the test.  I have a pair of cron jobs set up to
run these two commands weekly, running the first test at midnight on
Sunday-into-Monday, and then running the report at midnight of
Monday-into-Tuesday (the test usually takes ~2hr on my machine and I
tend to forget).  My own drive started throwing errors a while ago and
so I've bought a replacement and just need to do the
backup/replace/reinstall/restore dance when I get the time next week.

-tim

On April  5, 2019, Linux for blind general discussion wrote:
> This is an error report from
> smartctl -a /dev/sda
> Truncated, showing only intro and error section
> smartctl 6.6 2016-05-31 r4324
> [x86_64-linux-4.18.0-0.bpo.1-rt-amd64] (local build) Copyright (C)
> 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org
> 
> === START OF INFORMATION SECTION ===
> Device Model:     HITACHI HUS724040ALE640
> Serial Number:    PAGRGBRS
> LU WWN Device Id: 5 000cca 22bca3623
> Firmware Version: MJAONS04
> User Capacity:    4,000,787,030,016 bytes [4.00 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    7200 rpm
> Form Factor:      3.5 inches
> Device is:        Not in smartctl database [for details use: -P
> showall] ATA Version is:   ATA8-ACS T13/1699-D revision 4
> SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Fri Apr  5 02:43:04 2019 CDT
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> SMART Error Log Version: 1
> ATA Error Count: 65535 (device log contains only the most recent
> five errors) CR = Command Register [HEX]
> 	FR = Features Register [HEX]
> 	SC = Sector Count Register [HEX]
> 	SN = Sector Number Register [HEX]
> 	CL = Cylinder Low Register [HEX]
> 	CH = Cylinder High Register [HEX]
> 	DH = Device/Head Register [HEX]
> 	DC = Device Command Register [HEX]
> 	ER = Error register [HEX]
> 	ST = Status register [HEX]
> Powered_Up_Time is measured from power on, and printed as
> DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
> SS=sec, and sss=millisec. It "wraps" after 49.710 days.
> 
> Error 65535 occurred at disk power-on lifetime: 1335 hours (55 days
> + 15 hours) When the command that caused the error occurred, the
> device was active or idle.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   84 51 90 d0 bb 03 0d  Error: ICRC, ABRT at LBA = 0x0d03bbd0 =
> 218348496
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
>   -- -- -- -- -- -- -- --  ----------------  --------------------
>   61 00 c0 80 b6 03 40 00      10:34:04.997  WRITE FPDMA QUEUED
>   61 00 b8 80 b1 03 40 00      10:34:04.997  WRITE FPDMA QUEUED
>   61 e0 b0 80 bb 03 40 00      10:34:04.997  WRITE FPDMA QUEUED
>   ef 10 02 00 00 00 a0 00      10:34:04.997  SET FEATURES [Enable
> SATA feature] ec 00 00 00 00 00 a0 00      10:34:04.995  IDENTIFY
> DEVICE
> 
> 
> There are 5 more of these. What is this error telling me, exactly?
> I don't quite get it.
> 
> _______________________________________________
> Blinux-list mailing list
> Blinux-list@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/blinux-list

_______________________________________________
Blinux-list mailing list
Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list