On 1/23/25 7:23 AM, Joanne Koong wrote: > On Fri, Dec 13, 2024 at 2:23 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: >> >> This patchset adds support for folios larger than one page size in FUSE. >> >> This patchset is rebased on top of the (unmerged) patchset that removes temp >> folios in writeback [1]. This patchset was tested by running it through fstests >> on passthrough_hp. >> >> Please note that writes are still effectively one page size. Larger writes can >> be enabled by setting the order on the fgp flag passed in to __filemap_get_folio() >> but benchmarks show this significantly degrades performance. More investigation >> needs to be done into this. As such, buffered writes will be optimized in a >> future patchset. >> >> Benchmarks show roughly a ~45% improvement in read throughput. >> >> Benchmark setup: >> >> -- Set up server -- >> ./libfuse/build/example/passthrough_hp --bypass-rw=1 ~/libfuse >> ~/mounts/fuse/ --nopassthrough >> (using libfuse patched with https://github.com/libfuse/libfuse/pull/807) >> >> -- Run fio -- >> fio --name=read --ioengine=sync --rw=read --bs=1M --size=1G >> --numjobs=2 --ramp_time=30 --group_reporting=1 >> --directory=mounts/fuse/ >> >> Machine 1: >> No large folios: ~4400 MiB/s >> Large folios: ~7100 MiB/s >> >> Machine 2: >> No large folios: ~3700 MiB/s >> Large folios: ~6400 MiB/s >> >> >> [1] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@xxxxxxxxx/ >> > > A couple of updates on this: > * I'm going to remove the writeback patch (patch 11/12) in this series > and resubmit, and leave large folios writeback to be done as a > separate future patchset. Getting writeback to work with large folios > has a dependency on [1], which unfortunately does not look like it'll > be resolved anytime soon. If we cannot remove tmp pages, then we'll > likely need to use a different data structure than the rb tree to > account for large folios w/ tmp pages. I believe we can still enable > large folios overall even without large folios writeback, as even with > the inode->i_mapping set to a large folio order range, writeback will > still only operate on 4k folios until fgf_set_order() is explicitly > set in fuse_write_begin() for the __filemap_get_folio() call. > > * There's a discussion here [2] about perf degradation for writeback > writes on large folios due to writeback throttling when balancing > dirty pages. This is due to fuse enabling bdi strictlimit. More > experimentation will be needed to figure out what a good folio order > is, and whether it's possible to do something like remove the > strictlimit for privileged servers. FYI the sysadmin can already disable strictlimit for FUSE through /sys/class/bdi/<bdi>/strict_limit knob[*]. [*] https://lore.kernel.org/all/20221119005215.3052436-1-shr@xxxxxxxxxxxx/ -- Thanks, Jingbo