Am 14.07.2017 um 12:46 schrieb Gionatan Danti:
Il 14-07-2017 02:32 Reindl Harald ha scritto:
because you won't be that happy when the kernel spits out a disk each
time a random SATA command times out - the 4 RAID10 disks on my
workstation are from 2011 and showed them too several times in the
past while they are just fine
here you go:
http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive-timeouts/
Hi, so a premature/preventive drive detachment is not a silver bullet,
and I buy it. However, I would at least expect this behavior to be
configurable. Maybe it is, and I am missing something?
dunno, maybe it is, but it wouldn't be wise because in case of a RAID5
rebuilding after a disk-failure would become even more dangerous on a
large array as it is already
Anyway, what really surprise me is *not* the drive to not be detached,
rather permitting that corruption make its way into real data. I naively
expect that when a WRITE_QUEUED or CACHE_FLUSH command aborts/fails
(which *will* cause data corruption if not properly handled) the I/O
layer has the following possibilities:
given that i have seen at least SD-cards confirming over hours sucessful
writes with no single error in the syslog maybe it was one of the rare
cases where the hardware lied and if that is the case you have nearly no
chance on the software layer except verify each write with a uncached
read of the block which would have a unacceptable impact on performance
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html