Re: smartd errors

On Tue, 2015-10-27 at 10:39 -0700, Paolo Galtieri wrote:
> Folks,
>    recently I have started seeing these messages in /var/log/messages
> Oct 27 10:26:13 jackstraw smartd[1177]: Device: /dev/sde [SAT], 16 
> Currently unreadable (pending) sectors
> Oct 27 10:26:13 jackstraw smartd[1177]: Device: /dev/sde [SAT], 16 
> Offline uncorrectable sectors

You might want to post the output of `smartctl -x /dev/sde` to the
group so they can look at the full "smart" report.  Below is a drive
that does pass the smart test but it also shows over six years of use.
It has three errors shown which might not be too worrisome but I have
already ordered a new drive since this one is as old as it is. This
drive will move into my backup rotation to live out its final years.
show up. In my case it might have been due to a bad cable.  My 66
UDMA_CRC_Error_Count does not instantly suggest replacement of the
drive.  In this case you can move it to a new port and cable to see if
that stops the count from incrementing.

10:57-doug@wombat-~>sudo smartctl -x /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.2.3-200.fc22.x86_64]
(local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke,

Model Family:     Hitachi Deskstar 7K1000.B
Device Model:     Hitachi HDT721010SLA360
Serial Number:    STF607MS01PHHK
LU WWN Device Id: 5 000cca 35ec0c514
Firmware Version: ST6OA31B
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Oct 27 10:57:14 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM level is:     128 (quiet), recommended: 128
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection
					was never started.
					Auto Offline Data Collection:
Self-test execution status:      (   0)	The previous self-test
routine completed
					without error or no self-test
has ever 
					been run.
Total time to complete Offline 
data collection: 		(14090) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline
					Auto Offline data collection
on/off support.
					Suspend Offline collection upon
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 235) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     PO-R--   099   099   016    -    3
  2 Throughput_Performance  P-S---   100   100   054    -    0
  3 Spin_Up_Time            POS---   128   128   024    -    431
(Average 464)
  4 Start_Stop_Count        -O--C-   100   100   000    -    129
  5 Reallocated_Sector_Ct   PO--CK   100   100   005    -    0
  7 Seek_Error_Rate         PO-R--   100   100   067    -    0
  8 Seek_Time_Performance   P-S---   100   100   020    -    0
  9 Power_On_Hours          -O--C-   093   093   000    -    53446
 10 Spin_Retry_Count        PO--C-   100   100   060    -    0
 12 Power_Cycle_Count       -O--CK   100   100   000    -    129
192 Power-Off_Retract_Count -O--CK   099   099   000    -    2347
193 Load_Cycle_Count        -O--C-   099   099   000    -    2347
194 Temperature_Celsius     -O----   171   171   000    -    35
(Min/Max 22/48)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
197 Current_Pending_Sector  -O---K   100   100   000    -    0
198 Offline_Uncorrectable   ---R--   100   100   000    -    0
199 UDMA_CRC_Error_Count    -O-R--   200   200   000    -    66
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  SATA NCQ Queued Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x20       GPL     R/O      1  Streaming performance log [OBS-8]
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 5 (device log contains only the most recent 4
	CR     = Command Register
	FEATR  = Features Register
	COUNT  = Count (was: Sector Count) Register
	LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
	LH     = LBA High (was: Cylinder High) Register    ]   LBA
	LM     = LBA Mid (was: Cylinder Low) Register      ] Register
	LL     = LBA Low (was: Sector Number) Register     ]
	DV     = Device (was: Device/Head) Register
	DC     = Device Control Register
	ER     = Error register
	ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 5 [0] occurred at disk power-on lifetime: 53052 hours (2210 days
+ 12 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 51 02 70 00 00 40 9b e1 c4 e0 00  Error: ICRC, ABRT 624 sectors
at LBA = 0x409be1c4 = 1083957700

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
  -- == -- == -- == == == -- -- -- -- --  ---------------  ------------
  25 00 00 04 80 00 00 40 9b df b4 e0 08 24d+15:49:30.500  READ DMA EXT
  25 00 00 05 00 00 00 40 9b da b4 e0 08 24d+15:49:30.500  READ DMA EXT
  25 00 00 05 00 00 00 40 9b d5 b4 e0 08 24d+15:49:30.500  READ DMA EXT
  25 00 00 05 00 00 00 40 9b d0 b4 e0 08 24d+15:49:30.500  READ DMA EXT
  25 00 00 05 00 00 00 40 9b cb b4 e0 08 24d+15:49:30.400  READ DMA EXT

Error 4 [3] occurred at disk power-on lifetime: 52716 hours (2196 days
+ 12 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 51 04 70 00 00 0b 01 96 f8 eb 00  Error: ICRC, ABRT 1136
sectors at LBA = 0x0b0196f8 = 184653560

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
  -- == -- == -- == == == -- -- -- -- --  ---------------  ------------
  25 00 00 05 00 00 00 0b 01 96 68 e0 08 10d+15:40:59.900  READ DMA EXT
  25 00 00 05 80 00 00 0b 01 90 e8 e0 08 10d+15:40:59.800  READ DMA EXT
  25 00 00 05 00 00 00 0b 01 8b e8 e0 08 10d+15:40:59.800  READ DMA EXT
  25 00 00 05 00 00 00 0b 01 86 e8 e0 08 10d+15:40:59.800  READ DMA EXT
  25 00 00 05 00 00 00 0b 01 81 e8 e0 08 10d+15:40:59.800  READ DMA EXT

Error 3 [2] occurred at disk power-on lifetime: 52549 hours (2189 days
+ 13 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 51 04 20 00 00 55 2f 68 94 e5 00  Error: ICRC, ABRT 1056
sectors at LBA = 0x552f6894 = 1429170324

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
  -- == -- == -- == == == -- -- -- -- --  ---------------  ------------
  25 00 00 04 80 00 00 55 2f 68 34 e0 08  3d+16:17:57.000  READ DMA EXT
  25 00 00 05 00 00 00 55 2f 63 34 e0 08  3d+16:17:57.000  READ DMA EXT
  25 00 00 04 00 00 00 55 2f 5f 34 e0 08  3d+16:17:57.000  READ DMA EXT
  25 00 00 04 80 00 00 55 2f 5a b4 e0 08  3d+16:17:57.000  READ DMA EXT
  25 00 00 04 80 00 00 55 2f 56 34 e0 08  3d+16:17:56.900  READ DMA EXT

Error 2 [1] occurred at disk power-on lifetime: 29675 hours (1236 days
+ 11 hours)
  When the command that caused the error occurred, the device was
active or idle.

  After command completion occurred, registers were:
  -- -- -- == -- == == == -- -- -- -- --
  84 -- 51 00 30 00 00 49 cc 77 94 49 00  Error: ICRC, ABRT at LBA =
0x49cc7794 = 1238136724

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time 
  -- == -- == -- == == == -- -- -- -- --  ---------------  ------------
  60 01 00 00 08 00 00 49 cc 76 c4 40 08 13d+15:48:33.604  READ FPDMA
  60 01 00 00 00 00 00 49 cc 75 c4 40 08 13d+15:48:33.604  READ FPDMA
  60 01 00 00 08 00 00 49 cc 74 c4 40 08 13d+15:48:33.604  READ FPDMA
  60 01 00 00 00 00 00 49 cc 73 c4 40 08 13d+15:48:33.604  READ FPDMA
  60 01 00 00 08 00 00 49 cc 72 c4 40 08 13d+15:48:33.604  READ FPDMA

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining 
 LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      5193   

SMART Selective self-test log data structure revision number 1
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute

SCT Status Version:                  3
SCT Version (vendor specific):       256 (0x0100)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    36 Celsius
Power Cycle Min/Max Temperature:     34/45 Celsius
Lifetime    Min/Max Temperature:     22/48 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        1 minute
Min/Max recommended Temperature:      0/60 Celsius
Min/Max Temperature Limit:           -40/70 Celsius
Temperature History Size (Index):    128 (83)

Index    Estimated Time   Temperature Celsius
  84    2015-10-27 08:50    36  *****************
  85    2015-10-27 08:51    35  ****************
  86    2015-10-27 08:52    35  ****************
  87    2015-10-27 08:53    36  *****************
  88    2015-10-27 08:54    35  ****************
  89    2015-10-27 08:55    35  ****************
  90    2015-10-27 08:56    36  *****************
  91    2015-10-27 08:57    35  ****************
 ...    ..(  9 skipped).    ..  ****************
 101    2015-10-27 09:07    35  ****************
 102    2015-10-27 09:08    36  *****************
 103    2015-10-27 09:09    36  *****************
 104    2015-10-27 09:10    35  ****************
 105    2015-10-27 09:11    36  *****************
 106    2015-10-27 09:12    36  *****************
 107    2015-10-27 09:13    35  ****************
 108    2015-10-27 09:14    36  *****************
 109    2015-10-27 09:15    35  ****************
 110    2015-10-27 09:16    35  ****************
 111    2015-10-27 09:17    36  *****************
 115    2015-10-27 09:21    36  *****************
 116    2015-10-27 09:22    35  ****************
 117    2015-10-27 09:23    36  *****************
 118    2015-10-27 09:24    36  *****************
 119    2015-10-27 09:25    35  ****************
 120    2015-10-27 09:26    36  *****************
 ...    ..( 15 skipped).    ..  *****************
   8    2015-10-27 09:42    36  *****************
   9    2015-10-27 09:43    35  ****************
  10    2015-10-27 09:44    36  *****************
  11    2015-10-27 09:45    36  *****************
  12    2015-10-27 09:46    35  ****************
  13    2015-10-27 09:47    36  *****************
  14    2015-10-27 09:48    36  *****************
  15    2015-10-27 09:49    35  ****************
  16    2015-10-27 09:50    36  *****************
  17    2015-10-27 09:51    36  *****************
  18    2015-10-27 09:52    35  ****************
  19    2015-10-27 09:53    36  *****************
 ...    ..(  6 skipped).    ..  *****************
  26    2015-10-27 10:00    36  *****************
  27    2015-10-27 10:01    35  ****************
  28    2015-10-27 10:02    36  *****************
 ...    ..(  9 skipped).    ..  *****************
  38    2015-10-27 10:12    36  *****************
  39    2015-10-27 10:13    35  ****************
  40    2015-10-27 10:14    36  *****************
 ...    ..( 12 skipped).    ..  *****************
  53    2015-10-27 10:27    36  *****************
  54    2015-10-27 10:28    35  ****************
  55    2015-10-27 10:29    36  *****************
  56    2015-10-27 10:30    36  *****************
  57    2015-10-27 10:31    35  ****************
  58    2015-10-27 10:32    36  *****************
 ...    ..(  3 skipped).    ..  *****************
  62    2015-10-27 10:36    36  *****************
  63    2015-10-27 10:37    35  ****************
  64    2015-10-27 10:38    36  *****************
  65    2015-10-27 10:39    36  *****************
  66    2015-10-27 10:40    35  ****************
  67    2015-10-27 10:41    36  *****************
  68    2015-10-27 10:42    36  *****************
  69    2015-10-27 10:43    35  ****************
  70    2015-10-27 10:44    36  *****************
 ...    ..( 12 skipped).    ..  *****************
  83    2015-10-27 10:57    36  *****************

SCT Error Recovery Control:
           Read: Disabled
          Write: Disabled

Device Statistics (GP/SMART Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x0001  2            3  Command failed due to ICRC error
0x0009  2           29  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2           29  Device-to-host register FISes sent due to a
0x000b  2            0  CRC errors within host-to-device FIS
0x000d  2            0  Non-CRC errors within host-to-device FIS

Doug H.
