On Fri, Dec 06, 2024 at 09:41:25AM -0800, Joanne Koong wrote: > On Fri, Dec 6, 2024 at 1:50 AM Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: > > - folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, > > + folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN | > > fgf_set_order(len), > > > > Otherwise the large folio is not enabled on the buffer write path. > > > > > > Besides, when applying the above diff, the large folio is indeed enabled > > but it suffers severe performance regression: > > > > fio 1 job buffer write: > > 2GB/s BW w/o large folio, and 200MB/s BW w/ large folio > > This is the behavior I noticed as well when running some benchmarks on > v1 [1]. I think it's because when we call into __filemap_get_folio(), > we hit the FGP_CREAT path and if the order we set is too high, the > internal call to filemap_alloc_folio() will repeatedly fail until it > finds an order it's able to allocate (eg the do { ... } while (order-- > > min_order) loop). But this is very different frrom what other filesystems have measured when allocating large folios during writes. eg: https://lore.kernel.org/linux-fsdevel/20240527163616.1135968-1-hch@xxxxxx/ So we need to understand what's different about fuse. My suspicion is that it's disabling some other optimisation that is only done on order 0 folios, but that's just wild speculation. Needs someone to dig into it and look at profiles to see what's really going on.