On 10/07/2021 13:15, BW wrote:
I don't know if the "failfast" patch was ever pushed into the kernel
back in 2017, but if it was, does it change anything in regards to the
SCTERC/Kernel-driver. timeout issue(s)?
Link to a thread about the patch: https://lkml.org/lkml/2016/11/18/1
And what is the reason why mdadm just doesn't mark a drive fail if no
response has been received from a array-member-device within e.g. 29
seconds (just less than kernel-driver default timeout of 30 sec) e.g.
because of write/read issue. Then all those SCTERC/kernel-driver
timeout-issues would be solved, right?
https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
I suggest you read up on what the timeout mismatch problem really is.
And why doesn't mdadm just mark a device as failed? - the problem is it
does EXACTLY THAT! And it is doing that that will destroy your parity
raid if you are unlucky.
The whole point of the mismatch problem is that the kernel timeout MUST
be GREATER than the drive timeout. Modern desktop drives do NOT have a
configurable timeout which, with modern shingled drives, can be measured
in TENS of MINUTES.
Cheers,
Wol