Re: [PATCH] md: simplify flush request handling

Shaohua Li <shli@xxxxxxxxxx> · Mon, 14 May 2018 09:46:00 -0700

On Fri, May 11, 2018 at 08:47:53AM +0800, Ming Lei wrote:
> Hi Shaohua,
> 
> On Fri, May 11, 2018 at 6:23 AM, Shaohua Li <shli@xxxxxxxxxx> wrote:
> > From: Shaohua Li <shli@xxxxxx>
> >
> > The recent flush request handling seems unncessary complicated. The main
> > issue is in rdev_end_flush we can either get rdev of the bio or the
> > flush_info, not both, or we need extra memory to for the other. With the
> > extra memory, we need reallocate the memory in disk hotadd/remove.
> > Actually the original patch forgets one case of add_new_disk for memory
> > allocation, and we have kernel crash.
> >
> > The idea is always to increase all rdev reference in md_flush_request
> > and decrease the references after bio finish. In this way,
> > rdev_end_flush doesn't need to know rdev, so we don't need to allocate
> > extra memory.
> >
> > Cc: Xiao Ni <xni@xxxxxxxxxx>
> > Signed-off-by: Shaohua Li <shli@xxxxxx>
> > ---
> >  drivers/md/md.c | 89 ++++++++++++++-------------------------------------------
> >  drivers/md/md.h |  3 +-
> >  2 files changed, 23 insertions(+), 69 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 0bb1e2f..d9474f8 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -435,16 +435,12 @@ static void rdev_end_flush(struct bio *bi)
> >         struct bio *fbio = fi->fbio;
> >         struct md_rdev *rdev;
> >
> > -       rcu_read_lock();
> > -       rdev_for_each_rcu(rdev, mddev)
> > -               if (fi->bios[rdev->raid_disk] == bi) {
> > -                       fi->bios[rdev->raid_disk] = NULL;
> > +       if (atomic_dec_and_test(&fi->flush_pending)) {
> > +               rcu_read_lock();
> > +               rdev_for_each_rcu(rdev, mddev)
> >                         rdev_dec_pending(rdev, mddev);
> > -                       break;
> > -               }
> > -       rcu_read_unlock();
> > +               rcu_read_unlock();
> >
> > -       if (atomic_dec_and_test(&fi->flush_pending)) {
> >                 if (fbio->bi_iter.bi_size == 0) {
> >                         /* an empty barrier - all done */
> >                         bio_endio(fbio);
> > @@ -465,14 +461,12 @@ void md_flush_request(struct mddev *mddev, struct bio *fbio)
> >  {
> >         struct md_rdev *rdev;
> >         struct flush_info *fi;
> > -       char *p = (char*)mddev->flush_info;
> >         int index;
> >
> >         atomic_inc(&mddev->flush_io);
> >
> >         index = jhash((void*)fbio, sizeof(fbio), 0) % NR_FLUSHS;
> > -       fi = (struct flush_info *)(p + index * (sizeof(struct flush_info)
> > -                       + mddev->raid_disks * sizeof(struct bio*)));
> > +       fi = &mddev->flush_info[index];
> >
> >         spin_lock_irq(&fi->flush_lock);
> >         wait_event_lock_irq(fi->flush_queue,
> 
> This way uses jhash for allocating flush_info, if two bio maps to same jhash
> value, then extra(often unnecessary) waiting is introduced for the latter bio
> since there can be other flush_info available.

That's possible, but it doesn't really matter. Our goal is not to completedly
avoid lock contention, reduce the contention is good enough.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html