Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings

Bernd Schubert <bernd.schubert@xxxxxxxxxxx> · Tue, 14 Jan 2025 22:40:08 +0100

On 1/14/25 21:29, Jeff Layton wrote:
> On Tue, 2025-01-14 at 11:12 -0800, Joanne Koong wrote:
>> On Tue, Jan 14, 2025 at 10:58 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>>>
>>> On Tue, 14 Jan 2025 at 19:08, Joanne Koong <joannelkoong@xxxxxxxxx> wrote:
>>>
>>>> - my understanding is that the majority of use cases do use splice (eg
>>>> iirc, libfuse does as well), in which case there's no point to this
>>>> patchset then
>>>
>>> If it turns out that non-splice writes are more performant, then
>>> libfuse can be fixed to use non-splice by default.   It's not as clear
>>> cut though, since write through (which is also the default in libfuse,
>>> AFAIK) should not be affected by all this, since that never used tmp
>>> pages.
>>
>> My thinking was that spliced writes without tmp pages would be
>> fastest, then non-splice writes w/out tmp pages and spliced writes w/
>> would be roughly the same. But i'd need to benchmark and verify this
>> assumption.
>>
> 
> A somewhat related question: is Bernd's io_uring patchset susceptible
> to the same problem as splice() in this situation? IOW, does the kernel
> inline pagecache pages into the io_uring buffers?

Right now it does a full copy, similar as non-splice /dev/fuse
read/write. I.e. it doesn't have zero copy either yet.

> 
> If it doesn't have the same issue, then maybe we should think about
> using that to make a clean behavior break. Gate large folios and not
> using bounce pages behind io_uring.
> 
> That would mean dealing with multiple IO paths, but that might still be
> simpler than trying to deal with multiple folio sizes in the writeback
> rbtree tracking.

My personal thinking regarding ZC was to hook into Mings work, I
didn't into deep details but from interface point of view it sounded
nice, like

- Application write
- fuse-client/kernel request/CQEs with write attempts
- fuse server prepares group SQE, group leader prepares
  the write buffer, other group members are consumers
  using their buffer part for the final destination
- release of leader buffer when other group members
  are done

Though, Pavel and Jens have concerns and have a different suggestion
and at least the example Pavel gave looks like splice

https://lore.kernel.org/all/f3a83b6a-c4b9-4933-998d-ebd1d09e3405@xxxxxxxxx/

I think David is looking into a different ZC solution, but I
don't have details on that.
Maybe fuse-io-uring and ublk splice approach should be another LSFMM
topic.

Thanks,
Bernd