Re: [PATCH] md/raid1: properly indicate failure when ending a failed write request

Paul Clements <paul.clements@xxxxxxxxxxx> · Wed, 21 Apr 2021 13:38:15 -0400

On Tue, Apr 20, 2021, 7:49 PM Song Liu <song@xxxxxxxxxx> wrote:
> On Tue, Apr 20, 2021 at 3:05 PM Paul Clements <paul.clements@xxxxxxxxxxx> wrote:
> >
> > This patch addresses a data corruption bug in raid1 arrays using bitmaps.
> > Without this fix, the bitmap bits for the failed I/O end up being cleared.
>
> I think this only happens when we re-add a faulty drive?

Yes, the bitmap gets cleared when the disk is marked faulty or a write
error occurs. Then when the disk is re-added, the bitmap-based resync
is, of course, not accurate.

Is there another way to deal with a transient, transport-based error,
other than this?

For instance, I'm using nbd as one of the mirror legs. In that case,
assuming the failures that lead to the device being marked faulty are
just transport/network issues, then we want the resync to be able to
correctly deal with this. It has always worked this way since a long
time ago. There was a fairly recent commit
(eeba6809d8d58908b5ed1b5ceb5fcb09a98a7cad) that re-arranged the code
(previously all write failures were retried via flagging with
R1BIO_WriteError).

Does the patch present a problem in some other scenario?

Thanks,
Paul