Hi Jason and David, On Fri, 8 Nov 2024 at 19:33, David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 08.11.24 18:05, Jason Gunthorpe wrote: > > On Fri, Nov 08, 2024 at 04:20:30PM +0000, Fuad Tabba wrote: > >> Some folios, such as hugetlb folios and zone device folios, > >> require special handling when the folio's reference count reaches > >> 0, before being freed. Moreover, guest_memfd folios will likely > >> require special handling to notify it once a folio's reference > >> count reaches 0, to facilitate shared to private folio conversion > >> [*]. Currently, each usecase has a dedicated callback when the > >> folio refcount reaches 0 to that effect. Adding yet more > >> callbacks is not ideal. > > > > Thanks for having a look! > > Replying to clarify some things. Fuad, feel free to add additional > information. Thanks for your comments Jason, and for clarifying my cover letter David. I think David has covered everything, and I'll make sure to clarify this in the cover letter when I respin. Cheers, /fuad > > > Honestly, I question this thesis. How complex would it be to have 'yet > > more callbacks'? Is the challenge really that the mm can't detect when > > guestmemfd is the owner of the page because the page will be > > ZONE_NORMAL? > > Fuad might have been a bit imprecise here: We don't want an ever growing > list of checks+callbacks on the page freeing fast path. > > This series replaces the two cases we have by a single generic one, > which is nice independent of guest_memfd I think. > > > > > So the point of this is really to allow ZONE_NORMAL pages to have a > > per-allocator callback? > > To intercept the refcount going to zero independent of any zones or > magic page types, without as little overhead in the common page freeing > path. > > It can be used to implement custom allocators, like factored out for > hugetlb in this series. It's not necessarily limited to that, though. It > can be used as a form of "asynchronous page ref freezing", where you get > notified once all references are gone. > > (I might have another use case with PageOffline, where we want to > prevent virtio-mem ones of them from getting accidentally leaked into > the buddy during memory offlining with speculative references -- > virtio_mem_fake_offline_going_offline() contains the interesting bits. > But I did not look into the dirty details yet, just some thought where > we'd want to intercept the refcount going to 0.) > > > > > But this is also why I suggested to shift them to ZONE_DEVICE for > > guestmemfd, because then you get these things for free from the pgmap. > > With this series even hugetlb gets it for "free", and hugetlb is not > quite the nail for the ZONE_DEVICE hammer IMHO :) > > For things we can statically set aside early during boot and never > really want to return to the buddy/another allocator, I would agree that > static ZONE_DEVICE would have possible. > > Whenever the buddy or other allocators are involved, and we might have > granularity as a handful of pages (e.g., taken from the buddy), getting > ZONE_DEVICE involved is not a good (or even feasible) approach. > > After all, all we want is intercept the refcount going to 0. > > -- > Cheers, > > David / dhildenb >