Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 14, 2025 at 2:07 AM Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
>
> On Tue, 14 Jan 2025 at 10:55, Bernd Schubert <bernd.schubert@xxxxxxxxxxx> wrote:
> >
> >
> >
> > On 1/14/25 10:40, Miklos Szeredi wrote:
> > > On Tue, 14 Jan 2025 at 09:38, Miklos Szeredi <miklos@xxxxxxxxxx> wrote:
> > >
> > >> Maybe an explicit callback from the migration code to the filesystem
> > >> would work. I.e. move the complexity of dealing with migration for
> > >> problematic filesystems (netfs/fuse) to the filesystem itself.  I'm
> > >> not sure how this would actually look, as I'm unfamiliar with the
> > >> details of page migration, but I guess it shouldn't be too difficult
> > >> to implement for fuse at least.
> > >
> > > Thinking a bit...
> > >
> > > 1) reading pages
> > >
> > > Pages are allocated (PG_locked set, PG_uptodate cleared) and passed to
> > > ->readpages(), which may make the pages uptodate asynchronously.  If a
> > > page is unlocked but not set uptodate, then caller is supposed to
> > > retry the reading, at least that's how I interpret
> > > filemap_get_pages().   This means that it's fine to migrate the page
> > > before it's actually filled with data, since the caller will retry.
> > >
> > > It also means that it would be sufficient to allocate the page itself
> > > just before filling it in, if there was a mechanism to keep track of
> > > these "not yet filled" pages.  But that probably off topic.
> >
> > With /dev/fuse buffer copies should be easy - just allocate the page
> > on buffer copy, control is in libfuse.
>
> I think the issue is with generic page cache code, which currently
> relies on the PG_locked flag on the allocated but not yet filled page.
>   If the generic code would be able to keep track of "under
> construction" ranges without relying on an allocated page, then the
> filesystem could allocate the page just before copying the data,
> insert the page into the cache mark the relevant portion of the file
> uptodate.
>
> > With splice you really need
> > a page state.
>
> It's not possible to splice a not-uptodate page.
>
> > I wrote this before already - what is the advantage of a tmp page copy
> > over /dev/fuse buffer copy? I.e. I wonder if we need splice at all here.
>
> Splice seems a dead end, but we probably need to continue supporting
> it for a while for backward compatibility.
>

There was a previous discussion about splice and tmp pages here [1], I
see the following issues with having splice default to using tmp pages
as a workaround:

- my understanding is that the majority of use cases do use splice (eg
iirc, libfuse does as well), in which case there's no point to this
patchset then
- codewise, imo this gets messy (eg we would still need the rb tree
and would now need to check writeback against folio writeback state
and against the rb tree)
- for the large folios work in [2], the implementation imo is pretty
clean because it's rebased on top of this patchset that removes the
tmp pages and rb tree. If we still have tmp pages, then this gets very
gnarly. There's not a good way I see to handle large folios in the rb
tree given this scenario:
a) writeback on a large folio is issued
b) we copy it to a tmp folio and clear writeback on it since it's
being spliced, we add this writeback request to the rb tree
c) the folio in the pagecache is evicted
d) another write occurs on a larger range that encompasses the range
in the writeback in a) or on a subset of it
Maybe this is doable with some other data structure instead of the rb
tree (eg an xarray with refcounts maybe?), but it'd be ideal if we
could find a solution (my guess is this would have to come from the
the mm layer?) that obviates tmp pages altogether.


Thanks,
Joanne

[1] https://lore.kernel.org/linux-fsdevel/CAJnrk1YwNw7C=EMfKQzN88Zq_2Qih5Te_bfkeaOf=tG+L3u9eA@xxxxxxxxxxxxxx/
[2] https://lore.kernel.org/linux-fsdevel/20241213221818.322371-1-joannelkoong@xxxxxxxxx/

> Thanks,
> Miklos





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux