On Tue, 2025-01-14 at 11:12 -0800, Joanne Koong wrote: > On Tue, Jan 14, 2025 at 10:58 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote: > > > > On Tue, 14 Jan 2025 at 19:08, Joanne Koong <joannelkoong@xxxxxxxxx> wrote: > > > > > - my understanding is that the majority of use cases do use splice (eg > > > iirc, libfuse does as well), in which case there's no point to this > > > patchset then > > > > If it turns out that non-splice writes are more performant, then > > libfuse can be fixed to use non-splice by default. It's not as clear > > cut though, since write through (which is also the default in libfuse, > > AFAIK) should not be affected by all this, since that never used tmp > > pages. > > My thinking was that spliced writes without tmp pages would be > fastest, then non-splice writes w/out tmp pages and spliced writes w/ > would be roughly the same. But i'd need to benchmark and verify this > assumption. > A somewhat related question: is Bernd's io_uring patchset susceptible to the same problem as splice() in this situation? IOW, does the kernel inline pagecache pages into the io_uring buffers? If it doesn't have the same issue, then maybe we should think about using that to make a clean behavior break. Gate large folios and not using bounce pages behind io_uring. That would mean dealing with multiple IO paths, but that might still be simpler than trying to deal with multiple folio sizes in the writeback rbtree tracking. > > > > > - codewise, imo this gets messy (eg we would still need the rb tree > > > and would now need to check writeback against folio writeback state > > > and against the rb tree) > > > > I'm thinking of something slightly different: remove the current tmp > > page mess, but instead of duplicating a page ref on splice, fall back > > to copying the cache page (see the user_pages case in > > fuse_copy_page()). This should have very similar performance to what > > we have today, but allows us to deal with page accesses the same way > > for both regular and splice I/O on /dev/fuse. > > If we copy the cache page, do we not have the same issue with needing > an rb tree to track writeback state since writeback on the original > folio would be immediately cleared? > -- Jeff Layton <jlayton@xxxxxxxxxx>