Re: [RFC PATCH 1/5] mm: intorduce __GFP_UNMAPPED and unmapped_alloc()

Song Liu <song@xxxxxxxxxx> · Thu, 18 May 2023 09:33:20 -0700

On Thu, May 18, 2023 at 8:24 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
>
> On Wed, May 17, 2023 at 11:35:56PM -0400, Kent Overstreet wrote:
> > On Wed, Mar 08, 2023 at 11:41:02AM +0200, Mike Rapoport wrote:
> > > From: "Mike Rapoport (IBM)" <rppt@xxxxxxxxxx>
> > >
> > > When set_memory or set_direct_map APIs used to change attribute or
> > > permissions for chunks of several pages, the large PMD that maps these
> > > pages in the direct map must be split. Fragmenting the direct map in such
> > > manner causes TLB pressure and, eventually, performance degradation.
> > >
> > > To avoid excessive direct map fragmentation, add ability to allocate
> > > "unmapped" pages with __GFP_UNMAPPED flag that will cause removal of the
> > > allocated pages from the direct map and use a cache of the unmapped pages.
> > >
> > > This cache is replenished with higher order pages with preference for
> > > PMD_SIZE pages when possible so that there will be fewer splits of large
> > > pages in the direct map.
> > >
> > > The cache is implemented as a buddy allocator, so it can serve high order
> > > allocations of unmapped pages.
> >
> > So I'm late to this discussion, I stumbled in because of my own run in
> > with executable memory allocation.
> >
> > I understand that post LSF this patchset seems to not be going anywhere,
> > but OTOH there's also been a desire for better executable memory
> > allocation; as noted by tglx and elsewhere, there _is_ a definite
> > performance impact on page size with kernel text - I've seen numbers in
> > the multiple single digit percentage range in the past.
> >
> > This patchset does seem to me to be roughly the right approach for that,
> > and coupled with the slab allocator for sub-page sized allocations it
> > seems there's the potential for getting a nice interface that spans the
> > full range of allocation sizes, from small bpf/trampoline allocations up
> > to modules.
> >
> > Is this patchset worth reviving/continuing with? Was it really just the
> > needed module refactoring that was the blocker?
>
> As I see it, this patchset only one building block out of three? four?
> If we are to repurpose it for code allocations it should be something like
>
> 1) allocate 2M page to fill the cache
> 2) remove this page from the direct map
> 3) map the 2M page ROX in module address space (usually some part of
>    vmalloc address space)
> 4) allocate a smaller chunk of that page to the actual caller (bpf,
>    modules, whatever)
>
> Right now (3) and (4) won't work for modules because they mix code and data
> in a single allocation.

I am working on patches based on the discussion in [1]. I am planning to
send v1 for review in a week or so.

Thanks,
Song

[1] https://lore.kernel.org/linux-mm/20221107223921.3451913-1-song@xxxxxxxxxx/

[...]