On Mon, Jan 27, 2020 at 12:29 PM Song Liu <song@xxxxxxxxxx> wrote: > > On Mon, Jan 27, 2020 at 7:26 AM David Jeffery <djeffery@xxxxxxxxxx> wrote: > > > > When using RAID1 and write-behind, md can deadlock when errors occur. With > > write-behind, r1bio structs can be accounted by raid1 as queued but not > > counted as pending. The pending count is dropped when the original bio is > > returned complete but write-behind for the r1bio may still be active. > > > > This breaks the accounting used in some conditions to know when the raid1 > > md device has reached an idle state. It can result in calls to > > freeze_array deadlocking. freeze_array will never complete from a negative > > "unqueued" value being calculated due to a queued count larger than the > > pending count. > > > > To properly account for write-behind, move the call to allow_barrier from > > call_bio_endio to raid_end_bio_io. When using write-behind, md can call > > call_bio_endio before all write-behind I/O is complete. Using > > raid_end_bio_io for the point to call allow_barrier will release the > > pending count at a point where all I/O for an r1bio, even write-behind, is > > done. > > > > Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx> > > --- > > > > raid1.c | 13 +++++++------ > > 1 file changed, 7 insertions(+), 6 deletions(-) > > > > > > diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > > index 201fd8aec59a..0196a9d9f7e9 100644 > > --- a/drivers/md/raid1.c > > +++ b/drivers/md/raid1.c > > @@ -279,22 +279,17 @@ static void reschedule_retry(struct r1bio *r1_bio) > > static void call_bio_endio(struct r1bio *r1_bio) > > { > > struct bio *bio = r1_bio->master_bio; > > - struct r1conf *conf = r1_bio->mddev->private; > > > > if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) > > bio->bi_status = BLK_STS_IOERR; > > > > bio_endio(bio); > > - /* > > - * Wake up any possible resync thread that waits for the device > > - * to go idle. > > - */ > > - allow_barrier(conf, r1_bio->sector); > > raid1_end_write_request() also calls call_bio_endio(). Do we need to fix > that? This basically is the problem the patch is fixing. We don't want allow_barrier() being called when call_bio_endio() is called directly from raid1_end_write_request(). Write-behind can still be active here so it was dropping the pending accounting too early. We only want it called when all I/O for the r1bio is complete, which shifting the allow_barrier() call to raid_end_bio_io() does. > Do we need to backport this to stable? Pros: It's a small patch Fixes an ugly deadlock which hangs md if it happens. Cons: Affects a feature most users don't use The bug appears to have existed for years with little notice I consider it rather borderline to being stable-worthy. > Thanks, > Song > Regards, David Jeffery > > } > > > > static void raid_end_bio_io(struct r1bio *r1_bio) > > { > > struct bio *bio = r1_bio->master_bio; > > + struct r1conf *conf = r1_bio->mddev->private; > > > > /* if nobody has done the final endio yet, do it now */ > > if (!test_and_set_bit(R1BIO_Returned, &r1_bio->state)) { > > @@ -305,6 +300,12 @@ static void raid_end_bio_io(struct r1bio *r1_bio) > > > > call_bio_endio(r1_bio); > > } > > + /* > > + * Wake up any possible resync thread that waits for the device > > + * to go idle. All I/Os, even write-behind writes, are done. > > + */ > > + allow_barrier(conf, r1_bio->sector); > > + > > free_r1bio(r1_bio); > > } > > > > >