Re: raid6 with dm-integrity should not cause device to fail

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Thu, 5 Sep 2019 22:27:22 +0100

On 05/09/19 21:15, Chris Murphy wrote:
>> You think that most people using this will be monitoring for
>> > dm-intergity reported errors? If all the errors are just rewritten
>> > silently then it's likely the only sign of an issue will be a
>> > performance impact, with no obvious sign as to where it's coming from.
> I very well might want a policy that says, send a notification if more
> than 10 errors of any nature are encountered within 1 minute or less.
> Maybe that drive gets scheduled for swap out sooner than later, but
> not urgently. But ejecting the drive, upon many errors, to act as the
> notification of a problem, I don't like that design. Those are
> actually two different problems, and I'm not being informed of the
> initial cause only of the far more urgent "drive ejected" case.

My immediate reaction, on reading this, was "has dm-integrity been set
up on a disk with old data, and the drive not been initialised?"

That would lead, I presume, to exactly this scenario ... think of raid's
--assume-clean option if the drive isn't clean ... the OP made me think
this could well be the scenario. In which case, of course, it sounds
like an implementation error in dm-integrity, or a user mistake. Do we
need more information about the scenario that is generating this?

Otherwise, we do have a seriously corrupt disk, and while it might be
nice to have some option to force-override what's going on, it also is
not wise to continue without at least seeking to diagnose the cause of
the corruption!

I think we need to know why dm-integrity is blowing up - if we don't
know that then we shouldn't try to deal with it in the raid layer.

Cheers,
Wol