On Mon, Jan 20, 2025 at 2:42 PM Jan Kara <jack@xxxxxxx> wrote: > > On Fri 17-01-25 14:45:01, Joanne Koong wrote: > > On Fri, Jan 17, 2025 at 3:53 AM Jan Kara <jack@xxxxxxx> wrote: > > > On Thu 16-01-25 15:38:54, Joanne Koong wrote: > > > I think tweaking min_pause is a wrong way to do this. I think that is just a > > > symptom. Can you run something like: > > > > > > while true; do > > > cat /sys/kernel/debug/bdi/<fuse-bdi>/stats > > > echo "---------" > > > sleep 1 > > > done >bdi-debug.txt > > > > > > while you are writing to the FUSE filesystem and share the output file? > > > That should tell us a bit more about what's happening inside the writeback > > > throttling. Also do you somehow configure min/max_ratio for the FUSE bdi? > > > You can check in /sys/block/<fuse-bdi>/bdi/{min,max}_ratio . I suspect the > > > problem is that the BDI dirty limit does not ramp up properly when we > > > increase dirtied pages in large chunks. > > > > This is the debug info I see for FUSE large folio writes where bs=1M > > and size=1G: > > > > > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 896 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1071104 kB > > BdiWritten: 4096 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3596 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1290240 kB > > BdiWritten: 4992 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3596 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1517568 kB > > BdiWritten: 5824 kB > > BdiWriteBandwidth: 25692 kBps > > b_dirty: 0 > > b_io: 1 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 7 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3596 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1747968 kB > > BdiWritten: 6720 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 896 kB > > DirtyThresh: 359824 kB > > BackgroundThresh: 179692 kB > > BdiDirtied: 1949696 kB > > BdiWritten: 7552 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3612 kB > > DirtyThresh: 361300 kB > > BackgroundThresh: 180428 kB > > BdiDirtied: 2097152 kB > > BdiWritten: 8128 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > > > > > I didn't do anything to configure/change the FUSE bdi min/max_ratio. > > This is what I see on my system: > > > > cat /sys/class/bdi/0:52/min_ratio > > 0 > > cat /sys/class/bdi/0:52/max_ratio > > 1 > > OK, we can see that BdiDirtyThresh stabilized more or less at 3.6MB. > Checking the code, this shows we are hitting __wb_calc_thresh() logic: > > if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) { > unsigned long limit = hard_dirty_limit(dom, dtc->thresh); > u64 wb_scale_thresh = 0; > > if (limit > dtc->dirty) > wb_scale_thresh = (limit - dtc->dirty) / 100; > wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh / > } > > so BdiDirtyThresh is set to DirtyThresh/100. This also shows bdi never > generates enough throughput to ramp up it's share from this initial value. > > > > Actually, there's a patch queued in mm tree that improves the ramping up of > > > bdi dirty limit for strictlimit bdis [1]. It would be nice if you could > > > test whether it changes something in the behavior you observe. Thanks! > > > > > > Honza > > > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche > > > s/mm-page-writeback-consolidate-wb_thresh-bumping-logic-into-__wb_calc_thresh.pa > > > tch > > > > I still see the same results (~230 MiB/s throughput using fio) with > > this patch applied, unfortunately. Here's the debug info I see with > > this patch (same test scenario as above on FUSE large folio writes > > where bs=1M and size=1G): > > > > BdiWriteback: 0 kB > > BdiReclaimable: 2048 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359132 kB > > BackgroundThresh: 179348 kB > > BdiDirtied: 51200 kB > > BdiWritten: 128 kB > > BdiWriteBandwidth: 102400 kBps > > b_dirty: 1 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 5 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359144 kB > > BackgroundThresh: 179352 kB > > BdiDirtied: 331776 kB > > BdiWritten: 1216 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359144 kB > > BackgroundThresh: 179352 kB > > BdiDirtied: 562176 kB > > BdiWritten: 2176 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 0 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359144 kB > > BackgroundThresh: 179352 kB > > BdiDirtied: 792576 kB > > BdiWritten: 3072 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > BdiWriteback: 64 kB > > BdiReclaimable: 0 kB > > BdiDirtyThresh: 3588 kB > > DirtyThresh: 359144 kB > > BackgroundThresh: 179352 kB > > BdiDirtied: 1026048 kB > > BdiWritten: 3904 kB > > BdiWriteBandwidth: 0 kBps > > b_dirty: 0 > > b_io: 0 > > b_more_io: 0 > > b_dirty_time: 0 > > bdi_list: 1 > > state: 1 > > --------- > > Yeah, here the situation is really the same. As an experiment can you > experiment with setting min_ratio for the FUSE bdi to 1, 2, 3, ..., 10 (I > don't expect you should need to go past 10) and figure out when there's > enough slack space for the writeback bandwidth to ramp up to a full speed? > Thanks! > > Honza When locally testing this, I'm seeing that the max_ratio affects the bandwidth more so than min_ratio (eg the different min_ratios have roughly the same bandwidth per max_ratio). I'm also seeing somewhat high variance across runs which makes it hard to gauge what's accurate, but on average this is what I'm seeing: max_ratio=1 --- bandwidth= ~230 MiB/s max_ratio=2 --- bandwidth= ~420 MiB/s max_ratio=3 --- bandwidth= ~550 MiB/s max_ratio=4 --- bandwidth= ~653 MiB/s max_ratio=5 --- bandwidth= ~700 MiB/s max_ratio=6 --- bandwidth= ~810 MiB/s max_ratio=7 --- bandwidth= ~1040 MiB/s (and then a lot of times, 561 MiB/s on subsequent runs) Thanks, Joanne > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR