Re: [RFC PATCH 0/5] Prototype for direct map awareness in page allocator

Mike Rapoport <rppt@xxxxxxxxxx> · Thu, 9 Mar 2023 17:14:40 +0200

On Thu, Mar 09, 2023 at 01:59:00AM +0000, Edgecombe, Rick P wrote:
> On Wed, 2023-03-08 at 11:41 +0200, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" <rppt@xxxxxxxxxx>
> > 
> > Hi,
> > 
> > This is a third attempt to make page allocator aware of the direct
> > map
> > layout and allow grouping of the pages that must be unmapped from
> > the direct map.
> > 
> > This a new implementation of __GFP_UNMAPPED, kinda a follow up for
> > this set:
> > 
> > https://lore.kernel.org/all/20220127085608.306306-1-rppt@xxxxxxxxxx
> > 
> > but instead of using a migrate type to cache the unmapped pages, the
> > current implementation adds a dedicated cache to serve __GFP_UNMAPPED
> > allocations.
> 
> It seems a downside to having a page allocator outside of _the_ page
> allocator is you don't get all of the features that are baked in there.
> For example does secretmem care about numa? I guess in this
> implementation there is just one big cache for all nodes.
> 
> Probably most users would want __GFP_ZERO. Would secretmem care about
> __GFP_ACCOUNT?

The intention was that the pages in cache are always zeroed, so __GFP_ZERO
is always implicitly there, at least should have been.
__GFP_ACCOUNT is respected in this implementation. If you look at the
changes to __alloc_pages(), after getting pages from unmapped cache there
is 'goto out' to the point where the accounting is handled.

> I'm sure there is more, but I guess the question is, is
> the idea that these features all get built into unmapped-alloc at some
> point? The alternate approach is to have little caches for each usage
> like the grouped pages, which is probably less efficient when you have
> a bunch of them. Or solve it just for modules like the bpf allocator.
> Those are the tradeoffs for the approaches that have been explored,
> right?

I think that no matter what cache we'll use it won't be able to support all
features _the_ page allocator has. Indeed if we'd have per case cache
implementation we can tune that implementation to support features of
interest for that use case, but then we'll be less efficient in reducing
splits of the large pages. Not to mention increase in complexity as there
will be several caches doing similar but yet different things.

This POC mostly targets secretmem and modules, so this was pretty much
about GFP_KERNEL without considerations for NUMA, but I think extending
this unmapped alloc for NUMA should be simple enough but it will increase
memory overhead even more.

-- 
Sincerely yours,
Mike.