Re: SMART detects pending sectors; take offline?

Alexander Shenkin <al@xxxxxxxxxxx> · Thu, 12 Oct 2017 10:50:29 +0100

On 10/11/2017 6:10 PM, Phil Turmel wrote:
On 10/11/2017 06:31 AM, Alexander Shenkin wrote:
On 10/10/2017 1:55 PM, Phil Turmel wrote:

Which means the pending sector found by a smartctl background scan is
likely in a non-array data area.  And if not, the next scrub will fix
it.  You can run checkarray yourself if you don't want to wait.

Thanks Phil.  I ran checkarray --all --idle, and it completed fine, with
no Rebuild messages as far as I could see (looked in dmesg &
/var/log/syslog, see below).

[4444093.042246] md: data-check of RAID array md0
[4444093.042252] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[4444093.042254] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for data-check.
[4444093.042262] md: using 128k window, over a total of 1950656k.
[4444093.192032] md: delaying data-check of md2 until md0 has finished
(they share one or more physical units)
[4444106.854418] md: md0: data-check done.
[4444106.863292] md: data-check of RAID array md2
[4444106.863295] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[4444106.863298] md: using maximum available idle IO bandwidth (but not
more than 200000 KB/sec) for data-check.
[4444106.863304] md: using 128k window, over a total of 2920188928k.
[4475376.852520] md: md2: data-check done.

SMART still shows those 8 unreadable sectors.  dmesg has a bunch of
related errors, copied below.

Uh-oh.  Your kernel has a hangcheck timer that is shorter (120 seconds)
than the URE timeout of your crappy Seagate drive (w/ driver times out
at 180 seconds).  So the writeback that would fix the URE isn't happening.

You'll need to set your hangcheck timer to 180 seconds, too.  I'm not
sure how to do that.  (I've never seen this particular combination, but
it would be another black mark on desktop drives in raid arrays.)

Thanks Phil... Googling around, I haven't found a way to change it 
either, but then again, I'm not really sure what to search for.

What about changing my default disk timeout to something less than 120 
secs?  Say, 100 secs instead of 180?

Seems like this issue should probably make it into the timeout wiki 
page, no?  Perhaps some instructions on how to query your system's 
hangcheck timeout, and thus making sure that you set your drive timeouts 
to less than that?

Thanks,
Allie
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html