Re: [PATCH -next v2 7/7] md/raid1-10: limit the number of plugged bio

Xiao Ni <xni@xxxxxxxxxx> · Mon, 29 May 2023 15:57:37 +0800

On Mon, May 29, 2023 at 11:18 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2023/05/29 11:10, Xiao Ni 写道:
> > On Mon, May 29, 2023 at 10:20 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/05/29 10:08, Xiao Ni 写道:
> >>> Hi Kuai
> >>>
> >>> There is a limitation of the memory in your test. But for most
> >>> situations, customers should not set this. Can this change introduce a
> >>> performance regression against other situations?
> >>
> >> Noted that this limitation is just to triggered writeback as soon as
> >> possible in the test, and it's 100% sure real situations can trigger
> >> dirty pages write back asynchronously and continue to produce new dirty
> >> pages.
> >
> > Hi
> >
> > I'm confused here. If we want to trigger write back quickly, it needs
> > to set these two values with a smaller number, rather than 0 and 60.
> > Right?
>
> 60 is not required, I'll remove this setting.
>
> 0 just means write back if there are any dirty pages.

Hi Kuai

Does 0 mean disabling write back? I tried to find the doc that
describes the meaning when setting dirty_background_ration to 0, but I
didn't find it.
In https://www.kernel.org/doc/html/next/admin-guide/sysctl/vm.html it
doesn't describe this. But it says something like this

Note:
  dirty_background_bytes is the counterpart of dirty_background_ratio. Only
  one of them may be specified at a time. When one sysctl is written it is
  immediately taken into account to evaluate the dirty memory limits and the
  other appears as 0 when read.

Maybe you can specify dirty_background_ratio to 1 if you want to
trigger write back ASAP.

> >>
> >> If a lot of bio is not plugged, then it's the same as before; if a lot
> >> of bio is plugged, noted that before this patchset, these bio will spent
> >> quite a long time in plug, and hence I think performance should be
> >> better.
> >
> > Hmm, it depends on if it's sequential or not? If it's a big io
> > request, can it miss the merge opportunity?
>
> The bio will still be merged to underlying disks' rq(if it's rq based),
> underlying disk won't flush plug untill the number of request exceed
> threshold.

Thanks for this.

Regards
Xiao
>
> Thanks,
> Kuai
> >
> > Regards
> > Xiao
> >
> >>
> >> Thanks,
> >> Kuai
> >>>
> >>> Best Regards
> >>> Xiao
> >>>
> >>> On Wed, Apr 26, 2023 at 4:24 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> From: Yu Kuai <yukuai3@xxxxxxxxxx>
> >>>>
> >>>> bio can be added to plug infinitely, and following writeback test can
> >>>> trigger huge amount of plugged bio:
> >>>>
> >>>> Test script:
> >>>> modprobe brd rd_nr=4 rd_size=10485760
> >>>> mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean
> >>>> echo 0 > /proc/sys/vm/dirty_background_ratio
> >>>> echo 60 > /proc/sys/vm/dirty_ratio
> >>>> fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 -iodepth=128 -name=test
> >>>>
> >>>> Test result:
> >>>> Monitor /sys/block/md0/inflight will found that inflight keep increasing
> >>>> until fio finish writing, after running for about 2 minutes:
> >>>>
> >>>> [root@fedora ~]# cat /sys/block/md0/inflight
> >>>>          0  4474191
> >>>>
> >>>> Fix the problem by limiting the number of plugged bio based on the number
> >>>> of copies for original bio.
> >>>>
> >>>> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
> >>>> ---
> >>>>    drivers/md/raid1-10.c | 9 ++++++++-
> >>>>    drivers/md/raid1.c    | 2 +-
> >>>>    drivers/md/raid10.c   | 2 +-
> >>>>    3 files changed, 10 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c
> >>>> index 98d678b7df3f..35fb80aa37aa 100644
> >>>> --- a/drivers/md/raid1-10.c
> >>>> +++ b/drivers/md/raid1-10.c
> >>>> @@ -21,6 +21,7 @@
> >>>>    #define IO_MADE_GOOD ((struct bio *)2)
> >>>>
> >>>>    #define BIO_SPECIAL(bio) ((unsigned long)bio <= 2)
> >>>> +#define MAX_PLUG_BIO 32
> >>>>
> >>>>    /* for managing resync I/O pages */
> >>>>    struct resync_pages {
> >>>> @@ -31,6 +32,7 @@ struct resync_pages {
> >>>>    struct raid1_plug_cb {
> >>>>           struct blk_plug_cb      cb;
> >>>>           struct bio_list         pending;
> >>>> +       unsigned int            count;
> >>>>    };
> >>>>
> >>>>    static void rbio_pool_free(void *rbio, void *data)
> >>>> @@ -127,7 +129,7 @@ static inline void md_submit_write(struct bio *bio)
> >>>>    }
> >>>>
> >>>>    static inline bool md_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
> >>>> -                                     blk_plug_cb_fn unplug)
> >>>> +                                     blk_plug_cb_fn unplug, int copies)
> >>>>    {
> >>>>           struct raid1_plug_cb *plug = NULL;
> >>>>           struct blk_plug_cb *cb;
> >>>> @@ -147,6 +149,11 @@ static inline bool md_add_bio_to_plug(struct mddev *mddev, struct bio *bio,
> >>>>
> >>>>           plug = container_of(cb, struct raid1_plug_cb, cb);
> >>>>           bio_list_add(&plug->pending, bio);
> >>>> +       if (++plug->count / MAX_PLUG_BIO >= copies) {
> >>>> +               list_del(&cb->list);
> >>>> +               cb->callback(cb, false);
> >>>> +       }
> >>>> +
> >>>>
> >>>>           return true;
> >>>>    }
> >>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> >>>> index 639e09cecf01..c6066408a913 100644
> >>>> --- a/drivers/md/raid1.c
> >>>> +++ b/drivers/md/raid1.c
> >>>> @@ -1562,7 +1562,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
> >>>>                                                 r1_bio->sector);
> >>>>                   /* flush_pending_writes() needs access to the rdev so...*/
> >>>>                   mbio->bi_bdev = (void *)rdev;
> >>>> -               if (!md_add_bio_to_plug(mddev, mbio, raid1_unplug)) {
> >>>> +               if (!md_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) {
> >>>>                           spin_lock_irqsave(&conf->device_lock, flags);
> >>>>                           bio_list_add(&conf->pending_bio_list, mbio);
> >>>>                           spin_unlock_irqrestore(&conf->device_lock, flags);
> >>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> >>>> index bd9e655ca408..7135cfaf75db 100644
> >>>> --- a/drivers/md/raid10.c
> >>>> +++ b/drivers/md/raid10.c
> >>>> @@ -1306,7 +1306,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
> >>>>
> >>>>           atomic_inc(&r10_bio->remaining);
> >>>>
> >>>> -       if (!md_add_bio_to_plug(mddev, mbio, raid10_unplug)) {
> >>>> +       if (!md_add_bio_to_plug(mddev, mbio, raid10_unplug, conf->copies)) {
> >>>>                   spin_lock_irqsave(&conf->device_lock, flags);
> >>>>                   bio_list_add(&conf->pending_bio_list, mbio);
> >>>>                   spin_unlock_irqrestore(&conf->device_lock, flags);
> >>>> --
> >>>> 2.39.2
> >>>>
> >>>
> >>> .
> >>>
> >>
> >
> > .
> >
>