Re: [RFC PATCH v1 00/10] mm: Introduce and use folio_owner_ops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08.11.24 18:05, Jason Gunthorpe wrote:
On Fri, Nov 08, 2024 at 04:20:30PM +0000, Fuad Tabba wrote:
Some folios, such as hugetlb folios and zone device folios,
require special handling when the folio's reference count reaches
0, before being freed. Moreover, guest_memfd folios will likely
require special handling to notify it once a folio's reference
count reaches 0, to facilitate shared to private folio conversion
[*]. Currently, each usecase has a dedicated callback when the
folio refcount reaches 0 to that effect. Adding yet more
callbacks is not ideal.


Thanks for having a look!

Replying to clarify some things. Fuad, feel free to add additional information.

Honestly, I question this thesis. How complex would it be to have 'yet
more callbacks'? Is the challenge really that the mm can't detect when
guestmemfd is the owner of the page because the page will be
ZONE_NORMAL?

Fuad might have been a bit imprecise here: We don't want an ever growing list of checks+callbacks on the page freeing fast path.

This series replaces the two cases we have by a single generic one, which is nice independent of guest_memfd I think.


So the point of this is really to allow ZONE_NORMAL pages to have a
per-allocator callback?

To intercept the refcount going to zero independent of any zones or magic page types, without as little overhead in the common page freeing path.

It can be used to implement custom allocators, like factored out for hugetlb in this series. It's not necessarily limited to that, though. It can be used as a form of "asynchronous page ref freezing", where you get notified once all references are gone.

(I might have another use case with PageOffline, where we want to prevent virtio-mem ones of them from getting accidentally leaked into the buddy during memory offlining with speculative references -- virtio_mem_fake_offline_going_offline() contains the interesting bits. But I did not look into the dirty details yet, just some thought where we'd want to intercept the refcount going to 0.)


But this is also why I suggested to shift them to ZONE_DEVICE for
guestmemfd, because then you get these things for free from the pgmap.

With this series even hugetlb gets it for "free", and hugetlb is not quite the nail for the ZONE_DEVICE hammer IMHO :)

For things we can statically set aside early during boot and never really want to return to the buddy/another allocator, I would agree that static ZONE_DEVICE would have possible.

Whenever the buddy or other allocators are involved, and we might have granularity as a handful of pages (e.g., taken from the buddy), getting ZONE_DEVICE involved is not a good (or even feasible) approach.

After all, all we want is intercept the refcount going to 0.

--
Cheers,

David / dhildenb




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux