On Tue, Oct 29, 2024 at 9:39 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote: > > Hi David, > > On Fri, Oct 25, 2024 at 11:07:27PM -0700, David Rientjes wrote: > > On Wed, 16 Oct 2024, David Rientjes wrote: > > > > > ----->o----- > > > My takeaway: based on the feedback that was provided in the discussion: > > > > > > - we need an allocator abstraction for persistent memory that can return > > > memory with various characteristics: 1GB or not, kernel direct map or > > > not, HVO or not, etc. > > > > > > - built on top of that, we need the ability to carve out very large > > > ranges of memory (cloud provider use case) with NUMA awareness on the > > > kernel command line > > > > > > > Following up on this, I think this physical memory allocator would also be > > possible to use as a backend for hugetlb. Hopefully this would be an > > allocator that would be generally useful for multiple purposes, something > > like a mm/phys_alloc.c. > > Can you elaborate on this? mm/page_alloc.c already allocates physical > memory :) > > Or you mean an allocator that will deal with memory carved out from what page > allocator manages? > > > Frank van der Linden may also have thoughts on the above? Yeah 'physical allocator' is a bit of a misnomer. You're right, an allocator that deals with memory not under page allocator control is a better description. To elaborate a bit: there are various scenarios where allocating contiguous stretches of physical memory is useful. HugeTLB, VM guest memory. Or where you are presented with an external range of VM_PFNMAP memory and need to manage it in a simple way and hand it out for guest memory support (see NVidia's github for nvgrace-egm). However, all of these cases may come with slightly different requirements: is the memory purely external? Does it have struct pages? If so, is it in the direct map? Is the memmap for the memory optimized (HVO-style)? Does it need to be persistent? When does it need to be zeroed out? So that's why it seems like a good idea to come up with a slightly more generalized version of pool allocator - something that manages, usually larger, chunks of physically contiguous memory. A is initialized with certain properties (persistence, etc). It has methods to grow and shrink the pool if needed. It's in no way meant to be anywhere near as sophisticated as the page allocator, that would not be useful (and pointless code duplication). A simple fixed-size chunk pool will satisfy a lot of these cases. A number of the building blocks are already there: there's CMA, there's ZONE_DEVICE which has tools to manipulate some of these properties (by going through a hotremove / hotplug cycle). I created a simple prototype that essentially uses CMA as a pool provider, and uses some ZONE_DEVICE tools to initialize memory however you want it when it's added to the pool. I also added some new init code to to avoid things like unneeded memmap allocation at boot for hugetlbfs pages. I put hugetlbfs on top of it - but in a restricted way for prototyping purposes (no reservations, no demotion). Anyway, this is the basic idea. - Frank