Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Improving large folio writeback performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 20, 2025 at 2:42 PM Jan Kara <jack@xxxxxxx> wrote:
>
> On Fri 17-01-25 14:45:01, Joanne Koong wrote:
> > On Fri, Jan 17, 2025 at 3:53 AM Jan Kara <jack@xxxxxxx> wrote:
> > > On Thu 16-01-25 15:38:54, Joanne Koong wrote:
> > > I think tweaking min_pause is a wrong way to do this. I think that is just a
> > > symptom. Can you run something like:
> > >
> > > while true; do
> > >         cat /sys/kernel/debug/bdi/<fuse-bdi>/stats
> > >         echo "---------"
> > >         sleep 1
> > > done >bdi-debug.txt
> > >
> > > while you are writing to the FUSE filesystem and share the output file?
> > > That should tell us a bit more about what's happening inside the writeback
> > > throttling. Also do you somehow configure min/max_ratio for the FUSE bdi?
> > > You can check in /sys/block/<fuse-bdi>/bdi/{min,max}_ratio . I suspect the
> > > problem is that the BDI dirty limit does not ramp up properly when we
> > > increase dirtied pages in large chunks.
> >
> > This is the debug info I see for FUSE large folio writes where bs=1M
> > and size=1G:
> >
> >
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:            896 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1071104 kB
> > BdiWritten:               4096 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3596 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1290240 kB
> > BdiWritten:               4992 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3596 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1517568 kB
> > BdiWritten:               5824 kB
> > BdiWriteBandwidth:       25692 kBps
> > b_dirty:                     0
> > b_io:                        1
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       7
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3596 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1747968 kB
> > BdiWritten:               6720 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:            896 kB
> > DirtyThresh:            359824 kB
> > BackgroundThresh:       179692 kB
> > BdiDirtied:            1949696 kB
> > BdiWritten:               7552 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3612 kB
> > DirtyThresh:            361300 kB
> > BackgroundThresh:       180428 kB
> > BdiDirtied:            2097152 kB
> > BdiWritten:               8128 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> >
> >
> > I didn't do anything to configure/change the FUSE bdi min/max_ratio.
> > This is what I see on my system:
> >
> > cat /sys/class/bdi/0:52/min_ratio
> > 0
> > cat /sys/class/bdi/0:52/max_ratio
> > 1
>
> OK, we can see that BdiDirtyThresh stabilized more or less at 3.6MB.
> Checking the code, this shows we are hitting __wb_calc_thresh() logic:
>
>         if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
>                 unsigned long limit = hard_dirty_limit(dom, dtc->thresh);
>                 u64 wb_scale_thresh = 0;
>
>                 if (limit > dtc->dirty)
>                         wb_scale_thresh = (limit - dtc->dirty) / 100;
>                 wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh /
>         }
>
> so BdiDirtyThresh is set to DirtyThresh/100. This also shows bdi never
> generates enough throughput to ramp up it's share from this initial value.
>
> > > Actually, there's a patch queued in mm tree that improves the ramping up of
> > > bdi dirty limit for strictlimit bdis [1]. It would be nice if you could
> > > test whether it changes something in the behavior you observe. Thanks!
> > >
> > >                                                                 Honza
> > >
> > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche
> > > s/mm-page-writeback-consolidate-wb_thresh-bumping-logic-into-__wb_calc_thresh.pa
> > > tch
> >
> > I still see the same results (~230 MiB/s throughput using fio) with
> > this patch applied, unfortunately. Here's the debug info I see with
> > this patch (same test scenario as above on FUSE large folio writes
> > where bs=1M and size=1G):
> >
> > BdiWriteback:                0 kB
> > BdiReclaimable:           2048 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359132 kB
> > BackgroundThresh:       179348 kB
> > BdiDirtied:              51200 kB
> > BdiWritten:                128 kB
> > BdiWriteBandwidth:      102400 kBps
> > b_dirty:                     1
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       5
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359144 kB
> > BackgroundThresh:       179352 kB
> > BdiDirtied:             331776 kB
> > BdiWritten:               1216 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359144 kB
> > BackgroundThresh:       179352 kB
> > BdiDirtied:             562176 kB
> > BdiWritten:               2176 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:                0 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359144 kB
> > BackgroundThresh:       179352 kB
> > BdiDirtied:             792576 kB
> > BdiWritten:               3072 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
> > BdiWriteback:               64 kB
> > BdiReclaimable:              0 kB
> > BdiDirtyThresh:           3588 kB
> > DirtyThresh:            359144 kB
> > BackgroundThresh:       179352 kB
> > BdiDirtied:            1026048 kB
> > BdiWritten:               3904 kB
> > BdiWriteBandwidth:           0 kBps
> > b_dirty:                     0
> > b_io:                        0
> > b_more_io:                   0
> > b_dirty_time:                0
> > bdi_list:                    1
> > state:                       1
> > ---------
>
> Yeah, here the situation is really the same. As an experiment can you
> experiment with setting min_ratio for the FUSE bdi to 1, 2, 3, ..., 10 (I
> don't expect you should need to go past 10) and figure out when there's
> enough slack space for the writeback bandwidth to ramp up to a full speed?
> Thanks!
>
>                                                                 Honza

When locally testing this, I'm seeing that the max_ratio affects the
bandwidth more so than min_ratio (eg the different min_ratios have
roughly the same bandwidth per max_ratio). I'm also seeing somewhat
high variance across runs which makes it hard to gauge what's
accurate, but on average this is what I'm seeing:

max_ratio=1 --- bandwidth= ~230 MiB/s
max_ratio=2 --- bandwidth= ~420 MiB/s
max_ratio=3 --- bandwidth= ~550 MiB/s
max_ratio=4 --- bandwidth= ~653 MiB/s
max_ratio=5 --- bandwidth= ~700 MiB/s
max_ratio=6 --- bandwidth= ~810 MiB/s
max_ratio=7 --- bandwidth= ~1040 MiB/s (and then a lot of times, 561
MiB/s on subsequent runs)


Thanks,
Joanne

> --
> Jan Kara <jack@xxxxxxxx>
> SUSE Labs, CR





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux