Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings

Jeff Layton <jlayton@xxxxxxxxxx> · Tue, 14 Jan 2025 15:29:52 -0500

On Tue, 2025-01-14 at 11:12 -0800, Joanne Koong wrote:
> On Tue, Jan 14, 2025 at 10:58 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > 
> > On Tue, 14 Jan 2025 at 19:08, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
> > 
> > > - my understanding is that the majority of use cases do use splice (eg
> > > iirc, libfuse does as well), in which case there's no point to this
> > > patchset then
> > 
> > If it turns out that non-splice writes are more performant, then
> > libfuse can be fixed to use non-splice by default.   It's not as clear
> > cut though, since write through (which is also the default in libfuse,
> > AFAIK) should not be affected by all this, since that never used tmp
> > pages.
> 
> My thinking was that spliced writes without tmp pages would be
> fastest, then non-splice writes w/out tmp pages and spliced writes w/
> would be roughly the same. But i'd need to benchmark and verify this
> assumption.
> 

A somewhat related question: is Bernd's io_uring patchset susceptible
to the same problem as splice() in this situation? IOW, does the kernel
inline pagecache pages into the io_uring buffers?

If it doesn't have the same issue, then maybe we should think about
using that to make a clean behavior break. Gate large folios and not
using bounce pages behind io_uring.

That would mean dealing with multiple IO paths, but that might still be
simpler than trying to deal with multiple folio sizes in the writeback
rbtree tracking.

> > 
> > > - codewise, imo this gets messy (eg we would still need the rb tree
> > > and would now need to check writeback against folio writeback state
> > > and against the rb tree)
> > 
> > I'm thinking of something slightly different: remove the current tmp
> > page mess, but instead of duplicating a page ref on splice, fall back
> > to copying the cache page (see the user_pages case in
> > fuse_copy_page()).  This should have very similar performance to what
> > we have today, but allows us to deal with page accesses the same way
> > for both regular and splice I/O on /dev/fuse.
> 
> If we copy the cache page, do we not have the same issue with needing
> an rb tree to track writeback state since writeback on the original
> folio would be immediately cleared?
> 

-- 
Jeff Layton <jlayton@xxxxxxxxxx>