Re: [PATCH] md/raid1: release pending accounting for an I/O only after write-behind is also finished

Nigel Croxon <ncroxon@xxxxxxxxxx> · Thu, 23 Apr 2020 08:38:01 -0400

On 1/28/20 2:08 PM, Song Liu wrote:
On Mon, Jan 27, 2020 at 11:56 AM David Jeffery <djeffery@xxxxxxxxxx> wrote:
On Mon, Jan 27, 2020 at 12:29 PM Song Liu <song@xxxxxxxxxx> wrote:
On Mon, Jan 27, 2020 at 7:26 AM David Jeffery <djeffery@xxxxxxxxxx> wrote:
When using RAID1 and write-behind, md can deadlock when errors occur. With
write-behind, r1bio structs can be accounted by raid1 as queued but not
counted as pending. The pending count is dropped when the original bio is
returned complete but write-behind for the r1bio may still be active.

This breaks the accounting used in some conditions to know when the raid1
md device has reached an idle state. It can result in calls to
freeze_array deadlocking. freeze_array will never complete from a negative
"unqueued" value being calculated due to a queued count larger than the
pending count.

To properly account for write-behind, move the call to allow_barrier from
call_bio_endio to raid_end_bio_io. When using write-behind, md can call
call_bio_endio before all write-behind I/O is complete. Using
raid_end_bio_io for the point to call allow_barrier will release the
pending count at a point where all I/O for an r1bio, even write-behind, is
done.

Signed-off-by: David Jeffery <djeffery@xxxxxxxxxx>
---

  raid1.c |   13 +++++++------
  1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 201fd8aec59a..0196a9d9f7e9 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -279,22 +279,17 @@ static void reschedule_retry(struct r1bio *r1_bio)
  static void call_bio_endio(struct r1bio *r1_bio)
  {
         struct bio *bio = r1_bio->master_bio;
-       struct r1conf *conf = r1_bio->mddev->private;

         if (!test_bit(R1BIO_Uptodate, &r1_bio->state))
                 bio->bi_status = BLK_STS_IOERR;

         bio_endio(bio);
-       /*
-        * Wake up any possible resync thread that waits for the device
-        * to go idle.
-        */
-       allow_barrier(conf, r1_bio->sector);
raid1_end_write_request() also calls call_bio_endio(). Do we need to fix
that?
This basically is the problem the patch is fixing.  We don't want
allow_barrier() being called when call_bio_endio() is called directly
from raid1_end_write_request().  Write-behind can still be active here
so it was dropping the pending accounting too early.  We only want it
called when all I/O for the r1bio is complete, which shifting the
allow_barrier() call to raid_end_bio_io() does.
Thanks for the explanation. This looks good to me. I will process it
after the merge window.

I will also re-evaluate whether we need it for stable.

Thanks again,
Song

Hello Song,

Did you pull in this patch after the merge window?

I don't see it in your tree.

Thanks, Nigel