Re: [PATCH] md/raid1: properly indicate failure when ending a failed write request

Song Liu <song@xxxxxxxxxx> · Wed, 21 Apr 2021 22:58:50 -0700

On Wed, Apr 21, 2021 at 10:38 AM Paul Clements
<paul.clements@xxxxxxxxxxx> wrote:
>
> On Tue, Apr 20, 2021, 7:49 PM Song Liu <song@xxxxxxxxxx> wrote:
> > On Tue, Apr 20, 2021 at 3:05 PM Paul Clements <paul.clements@xxxxxxxxxxx> wrote:
> > >
> > > This patch addresses a data corruption bug in raid1 arrays using bitmaps.
> > > Without this fix, the bitmap bits for the failed I/O end up being cleared.
> >
> > I think this only happens when we re-add a faulty drive?
>
> Yes, the bitmap gets cleared when the disk is marked faulty or a write
> error occurs. Then when the disk is re-added, the bitmap-based resync
> is, of course, not accurate.
>
> Is there another way to deal with a transient, transport-based error,
> other than this?
>
> For instance, I'm using nbd as one of the mirror legs. In that case,
> assuming the failures that lead to the device being marked faulty are
> just transport/network issues, then we want the resync to be able to
> correctly deal with this. It has always worked this way since a long
> time ago. There was a fairly recent commit
> (eeba6809d8d58908b5ed1b5ceb5fcb09a98a7cad) that re-arranged the code
> (previously all write failures were retried via flagging with
> R1BIO_WriteError).

So I guess we need "Fixes eeba6809d8d589"?

CC Yufen, who authored the above patch.

>
> Does the patch present a problem in some other scenario?

I don't think this presents any problem.

Applied to md-next. (so no need to resend for the Fix tag).

Thanks,
Song