On 2018-04-16 11:02, Wol's lists wrote:
On 16/04/18 12:43, Austin S. Hemmelgarn wrote:
On 2018-04-15 21:04, Chris Murphy wrote:
I just ran into this:
https://github.com/neilbrown/mdadm/pull/32/commits/af1ddca7d5311dfc9ed60a5eb6497db1296f1bec
This solution is inadequate, can it be made more generic? This isn't
an md specific problem, it affects Btrfs and LVM as well. And in fact
raid0, and even none raid setups.
There is no good reason to prevent deep recovery, which is what
happens with the default command timer of 30 seconds, with this class
of drive. Basically that value is going to cause data loss for the
single device and also raid0 case, where the reset happens before deep
recovery has a chance. And even if deep recovery fails to return user
data, what we need to see is the proper error message: read error UNC,
rather than a link reset message which just obfuscates the problem.
This has been discussed at least once here before (probably more
times, hard to be sure since it usually comes up as a side discussion
in an only marginally related thread).
Sorry, but where is "here"? This message is cross-posted to about three
lists at least ...
Oops, didn't see the extra lists listed. In this case, discussed
previously on the BTRFS ML.
Last I knew, the consensus here was
that it needs to be changed upstream in the kernel, not by adding a
udev rule because while the value is technically system policy, the
default policy is brain-dead for anything but the original disks it
was i9ntended for (30 seconds works perfectly fine for actual SCSI
devices because they behave sanely in the face of media errors, but
it's horribly inadequate for ATA devices).
To re-iterate what I've said before on the subject:
imho (and it's probably going to be a pain to implement :-) there should
be a soft time-out and a hard time-out. The soft time-out should trigger
"drive is taking too long to respond" messages that end up in a log - so
that people who actually care can keep a track of this sort of thing.
The hard timeout should be the current set-up, where the kernel just
gives up.
Agreed, although as pointed out by Roger in his reply to this, it kind
of already works this way in some cases.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html