Re: [PATCH v3 00/12] fuse: support large folios

Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> · Thu, 23 Jan 2025 09:24:21 +0800

On 1/23/25 7:23 AM, Joanne Koong wrote:
> On Fri, Dec 13, 2024 at 2:23 PM Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>>
>> This patchset adds support for folios larger than one page size in FUSE.
>>
>> This patchset is rebased on top of the (unmerged) patchset that removes temp
>> folios in writeback [1]. This patchset was tested by running it through fstests
>> on passthrough_hp.
>>
>> Please note that writes are still effectively one page size. Larger writes can
>> be enabled by setting the order on the fgp flag passed in to __filemap_get_folio()
>> but benchmarks show this significantly degrades performance. More investigation
>> needs to be done into this. As such, buffered writes will be optimized in a
>> future patchset.
>>
>> Benchmarks show roughly a ~45% improvement in read throughput.
>>
>> Benchmark setup:
>>
>> -- Set up server --
>>  ./libfuse/build/example/passthrough_hp --bypass-rw=1 ~/libfuse
>> ~/mounts/fuse/ --nopassthrough
>> (using libfuse patched with https://github.com/libfuse/libfuse/pull/807)
>>
>> -- Run fio --
>>  fio --name=read --ioengine=sync --rw=read --bs=1M --size=1G
>> --numjobs=2 --ramp_time=30 --group_reporting=1
>> --directory=mounts/fuse/
>>
>> Machine 1:
>>     No large folios:     ~4400 MiB/s
>>     Large folios:        ~7100 MiB/s
>>
>> Machine 2:
>>     No large folios:     ~3700 MiB/s
>>     Large folios:        ~6400 MiB/s
>>
>>
>> [1] https://lore.kernel.org/linux-fsdevel/20241122232359.429647-1-joannelkoong@xxxxxxxxx/
>>
> 
> A couple of updates on this:
> * I'm going to remove the writeback patch (patch 11/12) in this series
> and resubmit, and leave large folios writeback to be done as a
> separate future patchset. Getting writeback to work with large folios
> has a dependency on [1], which unfortunately does not look like it'll
> be resolved anytime soon. If we cannot remove tmp pages, then we'll
> likely need to use a different data structure than the rb tree to
> account for large folios w/ tmp pages. I believe we can still enable
> large folios overall even without large folios writeback, as even with
> the inode->i_mapping set to a large folio order range, writeback will
> still only operate on 4k folios until fgf_set_order() is explicitly
> set in fuse_write_begin() for the __filemap_get_folio() call.
> 
> * There's a discussion here [2] about perf degradation for writeback
> writes on large folios due to writeback throttling when balancing
> dirty pages. This is due to fuse enabling bdi strictlimit. More
> experimentation will be needed to figure out what a good folio order
> is, and whether it's possible to do something like remove the
> strictlimit for privileged servers.

FYI the sysadmin can already disable strictlimit for FUSE through
/sys/class/bdi/<bdi>/strict_limit knob[*].

[*] https://lore.kernel.org/all/20221119005215.3052436-1-shr@xxxxxxxxxxxx/

-- 
Thanks,
Jingbo