Re: [PATCH] md/raid1: release pending accounting for an I/O only after write-behind is also finished

David Jeffery <djeffery@xxxxxxxxxx> · Mon, 27 Jan 2020 14:55:52 -0500

On Mon, Jan 27, 2020 at 12:29 PM Song Liu <song@xxxxxxxxxx> wrote:
>
> On Mon, Jan 27, 2020 at 7:26 AM David Jeffery <djeffery@xxxxxxxxxx> wrote:
> >
> > When using RAID1 and write-behind, md can deadlock when errors occur. With
> > write-behind, r1bio structs can be accounted by raid1 as queued but not
> > counted as pending. The pending count is dropped when the original bio is
> > returned complete but write-behind for the r1bio may still be active.
> >
> > This breaks the accounting used in some conditions to know when the raid1
> > md device has reached an idle state. It can result in calls to
> > freeze_array deadlocking. freeze_array will never complete from a negative
> > "unqueued" value being calculated due to a queued count larger than the
> > pending count.
> >
> > To properly account for write-behind, move the call to allow_barrier from
> > call_bio_endio to raid_end_bio_io. When using write-behind, md can call
> > call_bio_endio before all write-behind I/O is complete. Using
> > raid_end_bio_io for the point to call allow_barrier will release the
> > pending count at a point where all I/O for an r1bio, even write-behind, is
> > done.
> >
> > Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx>
> > ---
> >
> >  raid1.c |   13 +++++++------
> >  1 file changed, 7 insertions(+), 6 deletions(-)
> >
> >
> > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> > index 201fd8aec59a..0196a9d9f7e9 100644
> > --- a/drivers/md/raid1.c
> > +++ b/drivers/md/raid1.c
> > @@ -279,22 +279,17 @@ static void reschedule_retry(struct r1bio *r1_bio)
> >  static void call_bio_endio(struct r1bio *r1_bio)
> >  {
> >         struct bio *bio = r1_bio->master_bio;
> > -       struct r1conf *conf = r1_bio->mddev->private;
> >
> >         if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
> >                 bio->bi_status = BLK_STS_IOERR;
> >
> >         bio_endio(bio);
> > -       /*
> > -        * Wake up any possible resync thread that waits for the device
> > -        * to go idle.
> > -        */
> > -       allow_barrier(conf, r1_bio->sector);
>
> raid1_end_write_request() also calls call_bio_endio(). Do we need to fix
> that?

This basically is the problem the patch is fixing.  We don't want
allow_barrier() being called when call_bio_endio() is called directly
from raid1_end_write_request().  Write-behind can still be active here
so it was dropping the pending accounting too early.  We only want it
called when all I/O for the r1bio is complete, which shifting the
allow_barrier() call to raid_end_bio_io() does.

> Do we need to backport this to stable?

Pros:
It's a small patch
Fixes an ugly deadlock which hangs md if it happens.

Cons:
Affects a feature most users don't use
The bug appears to have existed for years with little notice

I consider it rather borderline to being stable-worthy.

> Thanks,
> Song
>

Regards,
David Jeffery

> >  }
> >
> >  static void raid_end_bio_io(struct r1bio *r1_bio)
> >  {
> >         struct bio *bio = r1_bio->master_bio;
> > +       struct r1conf *conf = r1_bio->mddev->private;
> >
> >         /* if nobody has done the final endio yet, do it now */
> >         if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) {
> > @@ -305,6 +300,12 @@ static void raid_end_bio_io(struct r1bio *r1_bio)
> >
> >                 call_bio_endio(r1_bio);
> >         }
> > +       /*
> > +        * Wake up any possible resync thread that waits for the device
> > +        * to go idle.  All I/Os, even write-behind writes, are done.
> > +        */
> > +       allow_barrier(conf, r1_bio->sector);
> > +
> >         free_r1bio(r1_bio);
> >  }
> >
> >
>