Re: SMART detects pending sectors; take offline?

Alexander Shenkin <al@xxxxxxxxxxx> · Thu, 4 Jan 2018 10:37:14 +0000

On 1/3/2018 4:02 PM, Phil Turmel wrote:
On 01/03/2018 10:59 AM, Alexander Shenkin wrote:
On 1/3/2018 3:53 PM, Phil Turmel wrote:
On 01/03/2018 08:50 AM, Alexander Shenkin wrote:
On 1/3/2018 1:26 PM, Brad Campbell wrote:

Nope. Your pending is still at 8, so you've got bad sectors in an area
of the drive that hasn't been dealt with. What is "interesting" is
that your SMART test results don't list the LBA of the first failure.
Disappointing behaviour on the part of the disk. They are within the
1st 10% of the drive however, so it wouldn't surprise me if they were
in an unused portion of the RAID superblock area.

Thanks Brad.  So, to theoretically get these sectors remapped so I don't
keep getting errors, I would have to somehow try to write to those
sectors.  That's tough given that the LBA's aren't reported as you
mention.  Perhaps my best course of action then is to:

No, just use dd to read that device -- it'll bail out with read error
when it hits the trouble spot, which will report the affected sector.
Then you can rewrite it with the appropriate seek= value.  (Assuming it
really is in an unused part of the member device.)

So, I got a read error as expected, running (physical sector size of sda 
is 4096):

dd if=/dev/sda of=/dev/null bs=4096

Is there some way to tell whether this sector is considered to be in 
use?  Not sure what the effect of rewriting it might be if it is...

If it's safe, I'd run:

dd if=/dev/zero of=/dev/sda seek=5857843312 count=1 bs=4096

Perhaps the way to go is to write to it, and then run checkarray again?

Thanks,
Allie

syslog here:

user@machinename:~$ cat /var/log/syslog | grep sda
Jan  4 08:23:30 machinename kernel: [1330854.323854] sd 0:0:0:0: [sda] 
tag#16 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan  4 08:23:30 machinename kernel: [1330854.323861] sd 0:0:0:0: [sda] 
tag#16 Sense Key : Medium Error [current] [descriptor]
Jan  4 08:23:30 machinename kernel: [1330854.323867] sd 0:0:0:0: [sda] 
tag#16 Add. Sense: Unrecovered read error - auto reallocate failed
Jan  4 08:23:30 machinename kernel: [1330854.323873] sd 0:0:0:0: [sda] 
tag#16 CDB: Read(16) 88 00 00 00 00 01 5d 27 98 08 00 00 01 00 00 00
Jan  4 08:23:30 machinename kernel: [1330854.323877] blk_update_request: 
I/O error, dev sda, sector 5857843312
Jan  4 08:23:33 machinename kernel: [1330858.108216] sd 0:0:0:0: [sda] 
tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jan  4 08:23:33 machinename kernel: [1330858.108222] sd 0:0:0:0: [sda] 
tag#3 Sense Key : Medium Error [current] [descriptor]
Jan  4 08:23:33 machinename kernel: [1330858.108228] sd 0:0:0:0: [sda] 
tag#3 Add. Sense: Unrecovered read error - auto reallocate failed
Jan  4 08:23:33 machinename kernel: [1330858.108235] sd 0:0:0:0: [sda] 
tag#3 CDB: Read(16) 88 00 00 00 00 01 5d 27 98 70 00 00 00 08 00 00
Jan  4 08:23:33 machinename kernel: [1330858.108239] blk_update_request: 
I/O error, dev sda, sector 5857843312
Jan  4 08:23:33 machinename kernel: [1330858.108297] Buffer I/O error on 
dev sda, logical block 732230414, async page read
Jan  4 08:42:07 machinename smartd[2203]: Device: /dev/sda [SAT], 8 
Currently unreadable (pending) sectors
Jan  4 08:42:07 machinename smartd[2203]: Device: /dev/sda [SAT], 8 
Offline uncorrectable sectors
Jan  4 08:42:07 machinename smartd[2203]: Device: /dev/sda [SAT], SMART 
Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 111 to 114
Jan  4 08:42:07 machinename smartd[2203]: Device: /dev/sda [SAT], SMART 
Usage Attribute: 187 Reported_Uncorrect changed from 100 to 98
Jan  4 08:42:07 machinename smartd[2203]: Device: /dev/sda [SAT], SMART 
Usage Attribute: 190 Airflow_Temperature_Cel changed from 47 to 49
Jan  4 08:42:07 machinename smartd[2203]: Device: /dev/sda [SAT], SMART 
Usage Attribute: 194 Temperature_Celsius changed from 53 to 51
Jan  4 08:42:07 machinename smartd[2203]: Device: /dev/sda [SAT], ATA 
error count increased from 0 to 2
Jan  4 08:42:08 machinename smartd[2203]: Device: /dev/sda [SAT], 8 
Currently unreadable (pending) sectors
Jan  4 08:42:08 machinename smartd[2203]: Device: /dev/sda [SAT], 8 
Offline uncorrectable sectors
Jan  4 08:42:08 machinename smartd[2203]: Device: /dev/sda [SAT], ATA 
error count increased from 0 to 2
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html