Re: [LSF/MM/BPF TOPIC] reducing direct map fragmentation

Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx> · Mon, 20 Feb 2023 14:43:03 +0000

On Sun, Feb 19, 2023 at 08:09:07PM +0200, Mike Rapoport wrote:
> On Sun, Feb 19, 2023 at 08:07:59AM +0000, Hyeonggon Yoo wrote:
> > On Wed, Feb 01, 2023 at 08:06:37PM +0200, Mike Rapoport wrote:
> > > Hi all,
> > 
> > Hi Mike, I'm interested in this topic and hope to discuss this with you
> > at LSF/MM/BPF.
> >  
> > > To reduce the performance hit caused by the fragmentation of the direct
> > > map, it makes sense to group and/or cache the base pages removed from the
> > > direct map so that the most of base pages created during a split of a large
> > > page will be consumed by users requiring PTE level mappings.
> > 
> > How much performance difference did you see in your test when direct
> > map was fragmented, or is there a way to check this difference? 
> 
> I did some benchmarks a while ago with the entire direct map forced to 2M
> or 4k pages. The results I had are here:
> 
> https://docs.google.com/spreadsheets/d/1tdD-cu8e93vnfGsTFxZ5YdaEfs2E1GELlvWNOGkJV2U/edit?usp=sharing
> 
> Intel folks did more comprehensive testing and their results are here:
> 
> https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@xxxxxxxxxxxxxxx/

Thanks!

Hmm it might not be best choice to unconditionally
merge 2M mappings to 1G a mapping. (maybe should be controlled via a
boot parameter or something)

> > > My current proposal is to have a cache of 2M pages close to the page
> > > allocator and use a GFP flag to make allocation request use that cache. On
> > > the free() path, the pages that are mapped at PTE level will be put into
> > > that cache.
> > 
> > I would like to discuss not only having cache layer of pages but also how
> > direct map could be merged correctly and efficiently.
> > 
> > I vaguely recall that Aaron Lu sent RFC series about this and Kirill A.
> > Shutemov's feedback was to batch merge operations. [1]
> > 
> > Also a CPA API called by the cache layer that could merge fragmented
> > mappings would work for merging 4K pages to 2M [2], but won't work
> > for merging 2M mappings to 1G mappings.
> 
> One possible way is to make CPA scan all PMDs in 1G page after merging a 2M
> page. Not sure how efficient would it be though.

That seems to be similar to what Kirill A. Shutemov has been tried.
He may have opinions about that?

[3] https://lore.kernel.org/lkml/20200416213229.19174-1-kirill.shutemov@xxxxxxxxxxxxxxx

> > At that time I didn't follow more discussions (e.g. execmem_alloc())
> > Maybe I'm missing some points.
> > 
> > [1] https://lore.kernel.org/linux-mm/20220809100408.rm6ofiewtty6rvcl@box
> > 
> > [2] https://lore.kernel.org/linux-mm/YvfLxuflw2ctHFWF@xxxxxxxxxx
> 
> -- 
> Sincerely yours,
> Mike.