On 1/14/25 21:29, Jeff Layton wrote: > On Tue, 2025-01-14 at 11:12 -0800, Joanne Koong wrote: >> On Tue, Jan 14, 2025 at 10:58 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: >>> >>> On Tue, 14 Jan 2025 at 19:08, Joanne Koong <joannelkoong@xxxxxxxxx> wrote: >>> >>>> - my understanding is that the majority of use cases do use splice (eg >>>> iirc, libfuse does as well), in which case there's no point to this >>>> patchset then >>> >>> If it turns out that non-splice writes are more performant, then >>> libfuse can be fixed to use non-splice by default. It's not as clear >>> cut though, since write through (which is also the default in libfuse, >>> AFAIK) should not be affected by all this, since that never used tmp >>> pages. >> >> My thinking was that spliced writes without tmp pages would be >> fastest, then non-splice writes w/out tmp pages and spliced writes w/ >> would be roughly the same. But i'd need to benchmark and verify this >> assumption. >> > > A somewhat related question: is Bernd's io_uring patchset susceptible > to the same problem as splice() in this situation? IOW, does the kernel > inline pagecache pages into the io_uring buffers? Right now it does a full copy, similar as non-splice /dev/fuse read/write. I.e. it doesn't have zero copy either yet. > > If it doesn't have the same issue, then maybe we should think about > using that to make a clean behavior break. Gate large folios and not > using bounce pages behind io_uring. > > That would mean dealing with multiple IO paths, but that might still be > simpler than trying to deal with multiple folio sizes in the writeback > rbtree tracking. My personal thinking regarding ZC was to hook into Mings work, I didn't into deep details but from interface point of view it sounded nice, like - Application write - fuse-client/kernel request/CQEs with write attempts - fuse server prepares group SQE, group leader prepares the write buffer, other group members are consumers using their buffer part for the final destination - release of leader buffer when other group members are done Though, Pavel and Jens have concerns and have a different suggestion and at least the example Pavel gave looks like splice https://lore.kernel.org/all/f3a83b6a-c4b9-4933-998d-ebd1d09e3405@xxxxxxxxx/ I think David is looking into a different ZC solution, but I don't have details on that. Maybe fuse-io-uring and ublk splice approach should be another LSFMM topic. Thanks, Bernd