Hi Joanne, Have no checked the whole series yet, but I just spent some time on the testing, attempting to find some statistics on the performance improvement. At least we need: @@ -2212,7 +2213,7 @@ static int fuse_write_begin(struct file *file, struct address_space *mapping, WARN_ON(!fc->writeback_cache); - folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, + folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN | fgf_set_order(len), Otherwise the large folio is not enabled on the buffer write path. Besides, when applying the above diff, the large folio is indeed enabled but it suffers severe performance regression: fio 1 job buffer write: 2GB/s BW w/o large folio, and 200MB/s BW w/ large folio Have not figured it out yet. On 11/26/24 6:05 AM, Joanne Koong wrote: > This patchset adds support for folios larger than one page size in FUSE. > > This patchset is rebased on top of the (unmerged) patchset that removes temp > folios in writeback [1]. (There is also a version of this patchset that is > independent from that change, but that version has two additional patches > needed to account for temp folios and temp folio copying, which may require > some debate to get the API right for as these two patches add generic > (non-FUSE) helpers. For simplicity's sake for now, I sent out this patchset > version rebased on top of the patchset that removes temp pages) > > This patchset was tested by running it through fstests on passthrough_hp. > > Benchmarks show roughly a ~45% improvement in read throughput. > > Benchmark setup: > > -- Set up server -- > ./libfuse/build/example/passthrough_hp --bypass-rw=1 ~/libfuse > ~/mounts/fuse/ --nopassthrough > (using libfuse patched with https://github.com/libfuse/libfuse/pull/807) > > -- Run fio -- > fio --name=read --ioengine=sync --rw=read --bs=1M --size=1G > --numjobs=2 --ramp_time=30 --group_reporting=1 > --directory=mounts/fuse/ > > Machine 1: > No large folios: ~4400 MiB/s > Large folios: ~7100 MiB/s > > Machine 2: > No large folios: ~3700 MiB/s > Large folios: ~6400 MiB/s > > Writes are still effectively one page size. Benchmarks showed that trying to get > the largest folios possible from __filemap_get_folio() is an over-optimization > and ends up being significantly more expensive. Fine-tuning for the optimal > order size for the __filemap_get_folio() calls can be done in a future patchset. > > [1] https://lore.kernel.org/linux-fsdevel/20241107235614.3637221-1-joannelkoong@xxxxxxxxx/ > > Changelog: > v1: https://lore.kernel.org/linux-fsdevel/20241109001258.2216604-1-joannelkoong@xxxxxxxxx/ > v1 -> v2: > * Change naming from "non-writeback write" to "writethrough write" > * Fix deadlock for writethrough writes by calling fault_in_iov_iter_readable() first > before __filemap_get_folio() (Josef) > * For readahead, retain original folio_size() for descs.length (Josef) > * Use folio_zero_range() api in fuse_copy_folio() (Josef) > * Add Josef's reviewed-bys > > Joanne Koong (12): > fuse: support copying large folios > fuse: support large folios for retrieves > fuse: refactor fuse_fill_write_pages() > fuse: support large folios for writethrough writes > fuse: support large folios for folio reads > fuse: support large folios for symlinks > fuse: support large folios for stores > fuse: support large folios for queued writes > fuse: support large folios for readahead > fuse: support large folios for direct io > fuse: support large folios for writeback > fuse: enable large folios > > fs/fuse/dev.c | 128 ++++++++++++++++++++++++------------------------- > fs/fuse/dir.c | 8 ++-- > fs/fuse/file.c | 126 +++++++++++++++++++++++++++++++----------------- > 3 files changed, 149 insertions(+), 113 deletions(-) > -- Thanks, Jingbo