Re: [PATCH RFC] md/raid1: fix deadlock between freeze_array() and wait_barrier().

Alexander Lyakas <alex.bolshoy@xxxxxxxxx> · Sun, 24 Mar 2019 21:07:38 +0200

Hi Neil,

I see now that the deadlock problem has been addressed by your commits:

79bd99596b7305ab08109a8bf44a6a4511dbf1cd
blk: improve order of bio handling in generic_make_request()

f5fe1b51905df7cfe4fdfd85c5fb7bc5b71a094f
blk: Ensure users for current->bio_list can see the full list.

Thanks for fixing this!

Alex.

On Wed, Jul 20, 2016 at 1:19 AM NeilBrown <neilb@xxxxxxxx> wrote:
>
> On Tue, Jul 19 2016, Alexander Lyakas wrote:
>
> > Hello Neil,
> > Thank you for your response.
> >
> > On Fri, Jul 15, 2016 at 2:18 AM, NeilBrown <neilb@xxxxxxxx> wrote:
> >> On Thu, Jul 14 2016, Alexander Lyakas wrote:
> >>
> >>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >>>> index 32e282f4c83c..c528102b80b6 100644
> >>>> --- a/drivers/md/raid10.c
> >>>> +++ b/drivers/md/raid10.c
> >>>> @@ -1288,7 +1288,7 @@ read_again:
> >>>>                                 sectors_handled;
> >>>>                         goto read_again;
> >>>>                 } else
> >>>> -                       generic_make_request(read_bio);
> >>>> +                       bio_list_add_head(&current->bio_list, read_bio);
> >>>>                 return;
> >>>>         }
> >>>
> >>> Unfortunately, this patch doesn't work. It is super elegant, and seems
> >>> like it really should work. But the problem is that the "rdev", to
> >>> which we want to send the READ bio, might also be a remapping device
> >>> (dm-linear, for example). This device will create its own remapped-bio
> >>> and will call generic_make_request(), which will stick the bio onto
> >>> current->bio_list TAIL:(:(:( So we are back at square one. This patch
> >>> would work if *all* the remapping drivers in the stack were doing
> >>> bio_list_add_head() instead of generic_make_request()  :(:(:(
> >>>
> >>> It seems the real fix should be that generic_make_request() would use
> >>> bio_list_add_head(), but as pointed in
> >>> http://www.spinics.net/lists/raid/msg52756.html, there are some
> >>> concerns about changing the order of remapped bios.
> >>>
> >>
> >> While those concerns are valid, they are about hypothetical performance
> >> issues rather than observed deadlock issues.  So I wouldn't be too
> >> worried about them.
> > I am thinking of a hypothetical driver that splits say a 12Kb WRITE
> > into 3x4kb WRITEs, and submits them in a proper order, hoping they
> > will get to the disk in the same order, and the disk will work
> > sequentially. But now we are deliberately hindering this. But I see
> > that people much smarter than me are in this discussion, so I will
> > leave it to them:)
>
> Sure, that is the concern and ideally we would keep things in order.
> But the elevator should re-order things in most cases so it shouldn't
> matter too much.
> For upstream, we obviously aim for best possible.  For stable backports,
> we sometimes accept non-ideal code when the change is less intrusive.
>
> If we do backport something to -stable, I would do it "properly" using
> something very similar to the upstream version.  You don't seem to want
> that for your code so I'm suggesting options that, while not 100% ideal,
> should suit your needs - and obviously you will test your use cases.
>
> >
> >> However I think you said that you didn't want to touch core code at all
> >> (maybe I misunderstood) so that wouldn't help you anyway.
> > Yes, this is correct. Recompiling the kernel is a bit of a pain for
> > us. We were smart enough to configure the md_mod as loadable module,
> > so at least now I can patch MD code easily:)
> >
> >>
> >> One option would be to punt the request requests to raidXd:
> >>
> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> index 40b35be34f8d..f795e27b2124 100644
> >> --- a/drivers/md/raid1.c
> >> +++ b/drivers/md/raid1.c
> >> @@ -1229,7 +1229,7 @@ read_again:
> >>                                 sectors_handled;
> >>                         goto read_again;
> >>                 } else
> >> -                       generic_make_request(read_bio);
> >> +                       reschedule_retry(r1_bio);
> >>                 return;
> >>         }
> >>
> >> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >> index 32e282f4c83c..eec38443075b 100644
> >> --- a/drivers/md/raid10.c
> >> +++ b/drivers/md/raid10.c
> >> @@ -1288,7 +1288,7 @@ read_again:
> >>                                 sectors_handled;
> >>                         goto read_again;
> >>                 } else
> >> -                       generic_make_request(read_bio);
> >> +                       reschedule_retry(r10_bio);
> >>                 return;
> >>         }
> > This is more or less what my rudimentary patch is doing, except it is
> > doing it only when we really need to wait for the barrier.
> >
> >>
> >>
> >> That might hurt performance, you would need to measure.
> >> The other approach would be to revert the patch that caused the problem.
> >> e.g.
> >> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >> index 40b35be34f8d..062bb86e5fd8 100644
> >> --- a/drivers/md/raid1.c
> >> +++ b/drivers/md/raid1.c
> >> @@ -884,7 +884,8 @@ static bool need_to_wait_for_sync(struct r1conf *conf, struct bio *bio)
> >>                         wait = false;
> >>                 else
> >>                         wait = true;
> >> -       }
> >> +       } else if (conf->barrier)
> >> +               wait = true;
> >>
> >>         return wait;
> >>  }
> >>
> >>
> > I am not sure how this patch helps. You added another condition, and
> > now READs will also wait in some cases. But still if array_frozen is
> > set, everybody will wait unconditionally, which is the root cause for
> > the deadlock I think.
>
> Maybe you're right.  I was thinking that array_frozen only causes
> problems if there are read requests in the generic_make_request queue,
> and this change would keep them out.  But I might have dropped a ball
> somewhere.
>
>
> >
> > I see that there will be no magic solution for this problem:(
> >
>
> Not really.
>
>
> NeilBrown