Re: Fault tolerance with badblocks

Nix <nix@xxxxxxxxxxxxx> · Tue, 09 May 2017 11:18:55 +0100

On 8 May 2017, Anthony Youngman verbalised:

> On 08/05/17 15:50, Nix wrote:
>> I wonder... scrubbing is not very useful with md, particularly with RAID
>> 6, because it does no writes unless something mismatches, and on failure
>> there is no attempt to determine which of the N disks is bad and rewrite
>> its contents from the other devices (nor, as I understand it, does it
>> clearly say which drive gave the error, so even failing it out and
>> resyncing it is hard).
>
> With redundant raid (and that doesn't include a two-disk, or even
> three-disk mirror), it SHOULD recalculate the failed block. If it
> doesn't bother even though it can, I'd call that a bug in scrub. What

It didn't, once upon a time (in 2010), and as far as I can tell from the
code it still doesn't.

> I thought happened was that it reads a stripe direct from disk, and if
> that failed it read the same stripe via the raid code, to get the raid
> error correction to fire, and then it rewrote the stripe.

There's *failed*, which does trigger a rewrite, and there's 'we got a
mismatch', which on RAID-6 arguably should trigger a rewrite but instead
just tells you there was a mismatch, but not where, nor even on what
disk.

> What would be a nice touch, is that if we have a massive timeout for
> non-SCT drives, if the scrub has to wait more than, say, 10 seconds
> for a read to succeed it then assumes the block is failing and
> rewrites it.

What tends to happen is that the drive gets reset, which from md's
perspective is the drive vanishing and reappearing again. I don't see
any sane way for md to interpret *that* as anything but a possibly
rather major failure that should be reacted to by failing the drive out.
I mean, all it knows is there was a timeout: for all it knows there are
electrical problems there or something. The drive doesn't say (and
doesn't get a chance to say, because we reset it rather than wait five
minutes for it to tell us what's up).

>              Actually, scrub that (groan... :-) - if the drive takes
> longer than 1/3 of the timeout to respond, then the scrub assumes it's
> dodgy and rewrites it.

It's hard to rewrite anything on a drive that's too busy failing a read
to do anything else.

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html