Re: SMART detects pending sectors; take offline?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I'm getting back to this now that I'll have time, apologies for the delay. So, is the following correct in the case of a read error?

1) System tries to read an unreadable sector
2) Drive timeout reports unreadable based on drive timeout setting.
2a) In this case, mdadm sees the sector is unreadable and rewrites it elsewhere on that drive. 3) If linux hangcheck timer runs out before the drive timeout, then linux aborts the read, logs an error, and mdadm isn't given a chance to rewrite elsewhere based on checksums.

I'm not sure how the linux io timeout fits in here, and how it's different from the hangcheck timer.

Given all this, it seems to me that I should now set the hangcheck timer to something greater than drive timeout (180 seconds). Does that sound right? Otherwise, linux will kill the rewrite again, no?

Thanks,
Allie

On 10/12/2017 4:52 PM, Edward Kuns wrote:
On Thu, Oct 12, 2017 at 10:16 AM, Edward Kuns <eddie.kuns@xxxxxxxxx> wrote:
All y'all referring to a whole separate kernel module, hangcheck-timer.ko?

Looking back at the original messages:

[4038193.380403] INFO: task md2_raid5:247 blocked for more than 120 seconds.
[4038193.380473]       Not tainted 4.4.0-81-generic #104~14.04.1-Ubuntu
[4038193.380526] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.

it looks like you're dealing with this part of the kernel:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/hung_task.c

The timer is configurable with sysctl and defaults to 120 seconds.
You can check with this command:

$ sudo sysctl kernel.hung_task_timeout_secs
kernel.hung_task_timeout_secs = 120

You can adjust it temporarily (e.g. to make it longer):

$ sudo sysctl -w kernel.hung_task_timeout_secs=150

Or you can adjust it permanently by modifying your sysctl configuration.

It looks like by default it will only warn ten times.  After that it
will stop complaining.  That is also configurable via sysctl.

               Eddie

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux