On Wed, Jan 22, 2025 at 5:24 PM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: > > > > On 1/23/25 7:23 AM, Joanne Koong wrote: > > On Fri, Dec 13, 2024 at 2:23 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > >> > >> This patchset adds support for folios larger than one page size in FUSE. > >> > >> This patchset is rebased on top of the (unmerged) patchset that removes temp > >> folios in writeback [1]. This patchset was tested by running it through fstests > >> on passthrough_hp. > >> > >> Please note that writes are still effectively one page size. Larger writes can > >> be enabled by setting the order on the fgp flag passed in to __filemap_get_folio() > >> but benchmarks show this significantly degrades performance. More investigation > >> needs to be done into this. As such, buffered writes will be optimized in a > >> future patchset. > >> > >> Benchmarks show roughly a ~45% improvement in read throughput. > >> > >> Benchmark setup: > >> > >> -- Set up server -- > >> ./libfuse/build/example/passthrough_hp --bypass-rw=1 ~/libfuse > >> ~/mounts/fuse/ --nopassthrough > >> (using libfuse patched with https://github.com/libfuse/libfuse/pull/807) > >> > >> -- Run fio -- > >> fio --name=read --ioengine=sync --rw=read --bs=1M --size=1G > >> --numjobs=2 --ramp_time=30 --group_reporting=1 > >> --directory=mounts/fuse/ > >> > >> Machine 1: > >> No large folios: ~4400 MiB/s > >> Large folios: ~7100 MiB/s > >> > >> Machine 2: > >> No large folios: ~3700 MiB/s > >> Large folios: ~6400 MiB/s > >> > >> > >> [1] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@xxxxxxxxx/ > >> > > > > A couple of updates on this: > > * I'm going to remove the writeback patch (patch 11/12) in this series > > and resubmit, and leave large folios writeback to be done as a > > separate future patchset. Getting writeback to work with large folios > > has a dependency on [1], which unfortunately does not look like it'll > > be resolved anytime soon. If we cannot remove tmp pages, then we'll > > likely need to use a different data structure than the rb tree to > > account for large folios w/ tmp pages. I believe we can still enable > > large folios overall even without large folios writeback, as even with > > the inode->i_mapping set to a large folio order range, writeback will > > still only operate on 4k folios until fgf_set_order() is explicitly > > set in fuse_write_begin() for the __filemap_get_folio() call. > > > > * There's a discussion here [2] about perf degradation for writeback > > writes on large folios due to writeback throttling when balancing > > dirty pages. This is due to fuse enabling bdi strictlimit. More > > experimentation will be needed to figure out what a good folio order > > is, and whether it's possible to do something like remove the > > strictlimit for privileged servers. > > FYI the sysadmin can already disable strictlimit for FUSE through > /sys/class/bdi/<bdi>/strict_limit knob[*]. > > [*] https://lore.kernel.org/all/20221119005215.3052436-1-shr@xxxxxxxxxxxx/ Oh cool, thanks for pointing this out! AFAICT, this means the sysadmin would have to do this individually for every fuse server that gets run. I wonder if we should do something like a) have fuse only enforce the strictlimit for unprivileged servers or b) add a fuse sysctl that sysadmins can set more easily for removing strictlimit for any server that gets run instead of having to do it individually Thanks, Joanne > > -- > Thanks, > Jingbo