Re: SMART detects pending sectors; take offline?

Alexander Shenkin <al@xxxxxxxxxxx> · Tue, 19 Dec 2017 10:35:57 +0000

On 12/18/2017 4:09 PM, Phil Turmel wrote:
Hi Alexander,

On 12/18/2017 10:51 AM, Alexander Shenkin wrote:
Hi all,

I'm getting back to this now that I'll have time, apologies for the
delay.  So, is the following correct in the case of a read error?

Not quite.

1) System tries to read an unreadable sector

2) Drive timeout reports unreadable based on drive timeout setting.

2a) In this case, mdadm sees the sector is unreadable and rewrites it
elsewhere on that drive.

No.  MD reconstructs the sector from redundancy (mirror or reverse
parity calc or reverse P+Q syndrome) and writes it back to the *same*
sector.  Since the drive firmware reported an error here, it knows to
verify the write as well.  If the verification fails, the drive firmware
will relocate the sector in the background, invisible to the upper
layers.  As far as MD is concerned, that sector address is fixed either
way.  Relocations are handled entirely within the drive.  MD does not
perform or track relocations.

3) If linux hangcheck timer runs out before the drive timeout, then
linux aborts the read, logs an error, and mdadm isn't given a chance
to rewrite elsewhere based on checksums.

No.  The hangcheck timer issue described in your forwarded email is
unrelated.  And MD doesn't use checksums.

Each drive has a device driver timeout, as you note below, found at
/sys/block/*/device/timeout, that linux's ATA/SCSI stack uses to cut off
non-responsive controller cards and/or drives.  If that timer runs out
on a read before the drive reports the read error, the low level
*driver* reports a read error to the MD layer.  MD treats it the same as
any other read error, locating or recomputing the sector from redundancy
as above.  The difference in this case is that the physical drive isn't
talking to the controller (link reset in progress, typically) and the
corrective rewrite of the sector (to fix or relocate within the drive)
is refused, and that write error causes MD to kick out the drive.  And
the pending sector is also left unfixed. >
Given all this, it seems to me that I should now set the hangcheck
timer to something greater than drive timeout (180 seconds).  Does
that sound right?  Otherwise, linux will kill the rewrite again, no?

In and of itself, waiting on I/O is not a hang.  So it should not be
applicable.

Ok, so, it's now my understanding that I would normally be ok, having 
set the driver timeout to 180 secs (thus giving time for the seagate 
drive to report the read error back up to the MD layer before 180 secs 
is up).  In my case, however, the kernel hangcheck timer is interrupting 
the process (md?) that is waiting on the sector read at 120 secs. 
Therefore, the writeback doesn't happen.

Thus, I should set the hangcheck to something > 120 (say, 180 secs - 
should it be >180 to let the driver timeout first?).  Does this sound 
correct?  Apologies if I'm repeating info from before - just trying to 
be sure about what I'm doing before I go ahead and do it.

If that's correct, I'll add the following line in /etc/sysctl.conf:

kernel.hung_task_timeout_secs = 180

I'll make sure the setting has taken, and then I'll run:

sudo /usr/share/mdadm/checkarray --idle --all

Thanks,
Allie

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html