Re: [LSF/MM/BPF TOPIC] reducing direct map fragmentation

Mike Rapoport <rppt@xxxxxxxxxx> · Fri, 24 Feb 2023 16:45:21 +0200

On Mon, Feb 20, 2023 at 02:43:03PM +0000, Hyeonggon Yoo wrote:
> On Sun, Feb 19, 2023 at 08:09:07PM +0200, Mike Rapoport wrote:
> > On Sun, Feb 19, 2023 at 08:07:59AM +0000, Hyeonggon Yoo wrote:
>  
> > > > My current proposal is to have a cache of 2M pages close to the page
> > > > allocator and use a GFP flag to make allocation request use that cache. On
> > > > the free() path, the pages that are mapped at PTE level will be put into
> > > > that cache.
> > > 
> > > I would like to discuss not only having cache layer of pages but also how
> > > direct map could be merged correctly and efficiently.
> > > 
> > > I vaguely recall that Aaron Lu sent RFC series about this and Kirill A.
> > > Shutemov's feedback was to batch merge operations. [1]
> > > 
> > > Also a CPA API called by the cache layer that could merge fragmented
> > > mappings would work for merging 4K pages to 2M [2], but won't work
> > > for merging 2M mappings to 1G mappings.
> > 
> > One possible way is to make CPA scan all PMDs in 1G page after merging a 2M
> > page. Not sure how efficient would it be though.
> 
> That seems to be similar to what Kirill A. Shutemov has been tried.
> He may have opinions about that?
> 
> [3] https://lore.kernel.org/lkml/20200416213229.19174-1-kirill.shutemov@xxxxxxxxxxxxxxx

Kirill's patch attempted to restore 1G page for each cpa_flush(), so it
scanned a lot of page tables without a guarantee that collapsing small
mappings to a large page is possible.

If we call a function that will collapse a 2M when we know for sure that
the collapse is possible, it will be more efficient.

-- 
Sincerely yours,
Mike.