On Mon, May 29, 2023 at 11:18 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2023/05/29 11:10, Xiao Ni 写道: > > On Mon, May 29, 2023 at 10:20 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > >> > >> Hi, > >> > >> 在 2023/05/29 10:08, Xiao Ni 写道: > >>> Hi Kuai > >>> > >>> There is a limitation of the memory in your test. But for most > >>> situations, customers should not set this. Can this change introduce a > >>> performance regression against other situations? > >> > >> Noted that this limitation is just to triggered writeback as soon as > >> possible in the test, and it's 100% sure real situations can trigger > >> dirty pages write back asynchronously and continue to produce new dirty > >> pages. > > > > Hi > > > > I'm confused here. If we want to trigger write back quickly, it needs > > to set these two values with a smaller number, rather than 0 and 60. > > Right? > > 60 is not required, I'll remove this setting. > > 0 just means write back if there are any dirty pages. Hi Kuai Does 0 mean disabling write back? I tried to find the doc that describes the meaning when setting dirty_background_ration to 0, but I didn't find it. In https://www.kernel.org/doc/html/next/admin-guide/sysctl/vm.html it doesn't describe this. But it says something like this Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only one of them may be specified at a time. When one sysctl is written it is immediately taken into account to evaluate the dirty memory limits and the other appears as 0 when read. Maybe you can specify dirty_background_ratio to 1 if you want to trigger write back ASAP. > >> > >> If a lot of bio is not plugged, then it's the same as before; if a lot > >> of bio is plugged, noted that before this patchset, these bio will spent > >> quite a long time in plug, and hence I think performance should be > >> better. > > > > Hmm, it depends on if it's sequential or not? If it's a big io > > request, can it miss the merge opportunity? > > The bio will still be merged to underlying disks' rq(if it's rq based), > underlying disk won't flush plug untill the number of request exceed > threshold. Thanks for this. Regards Xiao > > Thanks, > Kuai > > > > Regards > > Xiao > > > >> > >> Thanks, > >> Kuai > >>> > >>> Best Regards > >>> Xiao > >>> > >>> On Wed, Apr 26, 2023 at 4:24 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > >>>> > >>>> From: Yu Kuai <yukuai3@xxxxxxxxxx> > >>>> > >>>> bio can be added to plug infinitely, and following writeback test can > >>>> trigger huge amount of plugged bio: > >>>> > >>>> Test script: > >>>> modprobe brd rd_nr=4 rd_size=10485760 > >>>> mdadm -CR /dev/md0 -l10 -n4 /dev/ram[0123] --assume-clean > >>>> echo 0 > /proc/sys/vm/dirty_background_ratio > >>>> echo 60 > /proc/sys/vm/dirty_ratio > >>>> fio -filename=/dev/md0 -ioengine=libaio -rw=write -bs=4k -numjobs=1 -iodepth=128 -name=test > >>>> > >>>> Test result: > >>>> Monitor /sys/block/md0/inflight will found that inflight keep increasing > >>>> until fio finish writing, after running for about 2 minutes: > >>>> > >>>> [root@fedora ~]# cat /sys/block/md0/inflight > >>>> 0 4474191 > >>>> > >>>> Fix the problem by limiting the number of plugged bio based on the number > >>>> of copies for original bio. > >>>> > >>>> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > >>>> --- > >>>> drivers/md/raid1-10.c | 9 ++++++++- > >>>> drivers/md/raid1.c | 2 +- > >>>> drivers/md/raid10.c | 2 +- > >>>> 3 files changed, 10 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/drivers/md/raid1-10.c b/drivers/md/raid1-10.c > >>>> index 98d678b7df3f..35fb80aa37aa 100644 > >>>> --- a/drivers/md/raid1-10.c > >>>> +++ b/drivers/md/raid1-10.c > >>>> @@ -21,6 +21,7 @@ > >>>> #define IO_MADE_GOOD ((struct bio *)2) > >>>> > >>>> #define BIO_SPECIAL(bio) ((unsigned long)bio <= 2) > >>>> +#define MAX_PLUG_BIO 32 > >>>> > >>>> /* for managing resync I/O pages */ > >>>> struct resync_pages { > >>>> @@ -31,6 +32,7 @@ struct resync_pages { > >>>> struct raid1_plug_cb { > >>>> struct blk_plug_cb cb; > >>>> struct bio_list pending; > >>>> + unsigned int count; > >>>> }; > >>>> > >>>> static void rbio_pool_free(void *rbio, void *data) > >>>> @@ -127,7 +129,7 @@ static inline void md_submit_write(struct bio *bio) > >>>> } > >>>> > >>>> static inline bool md_add_bio_to_plug(struct mddev *mddev, struct bio *bio, > >>>> - blk_plug_cb_fn unplug) > >>>> + blk_plug_cb_fn unplug, int copies) > >>>> { > >>>> struct raid1_plug_cb *plug = NULL; > >>>> struct blk_plug_cb *cb; > >>>> @@ -147,6 +149,11 @@ static inline bool md_add_bio_to_plug(struct mddev *mddev, struct bio *bio, > >>>> > >>>> plug = container_of(cb, struct raid1_plug_cb, cb); > >>>> bio_list_add(&plug->pending, bio); > >>>> + if (++plug->count / MAX_PLUG_BIO >= copies) { > >>>> + list_del(&cb->list); > >>>> + cb->callback(cb, false); > >>>> + } > >>>> + > >>>> > >>>> return true; > >>>> } > >>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c > >>>> index 639e09cecf01..c6066408a913 100644 > >>>> --- a/drivers/md/raid1.c > >>>> +++ b/drivers/md/raid1.c > >>>> @@ -1562,7 +1562,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, > >>>> r1_bio->sector); > >>>> /* flush_pending_writes() needs access to the rdev so...*/ > >>>> mbio->bi_bdev = (void *)rdev; > >>>> - if (!md_add_bio_to_plug(mddev, mbio, raid1_unplug)) { > >>>> + if (!md_add_bio_to_plug(mddev, mbio, raid1_unplug, disks)) { > >>>> spin_lock_irqsave(&conf->device_lock, flags); > >>>> bio_list_add(&conf->pending_bio_list, mbio); > >>>> spin_unlock_irqrestore(&conf->device_lock, flags); > >>>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > >>>> index bd9e655ca408..7135cfaf75db 100644 > >>>> --- a/drivers/md/raid10.c > >>>> +++ b/drivers/md/raid10.c > >>>> @@ -1306,7 +1306,7 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio, > >>>> > >>>> atomic_inc(&r10_bio->remaining); > >>>> > >>>> - if (!md_add_bio_to_plug(mddev, mbio, raid10_unplug)) { > >>>> + if (!md_add_bio_to_plug(mddev, mbio, raid10_unplug, conf->copies)) { > >>>> spin_lock_irqsave(&conf->device_lock, flags); > >>>> bio_list_add(&conf->pending_bio_list, mbio); > >>>> spin_unlock_irqrestore(&conf->device_lock, flags); > >>>> -- > >>>> 2.39.2 > >>>> > >>> > >>> . > >>> > >> > > > > . > > >