Re: [RFC PATCH v2 4/7] mm: pgtable: try to reclaim empty PTE pages in zap_page_range_single()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05.08.24 14:55, Qi Zheng wrote:
Now in order to pursue high performance, applications mostly use some
high-performance user-mode memory allocators, such as jemalloc or
tcmalloc. These memory allocators use madvise(MADV_DONTNEED or MADV_FREE)
to release physical memory, but neither MADV_DONTNEED nor MADV_FREE will
release page table memory, which may cause huge page table memory usage.

The following are a memory usage snapshot of one process which actually
happened on our server:

         VIRT:  55t
         RES:   590g
         VmPTE: 110g

In this case, most of the page table entries are empty. For such a PTE
page where all entries are empty, we can actually free it back to the
system for others to use.

As a first step, this commit attempts to synchronously free the empty PTE
pages in zap_page_range_single() (MADV_DONTNEED etc will invoke this). In
order to reduce overhead, we only handle the cases with a high probability
of generating empty PTE pages, and other cases will be filtered out, such
as:

It doesn't make particular sense during munmap() where we will just remove the page tables manually directly afterwards. We should limit it to the !munmap case -- in particular MADV_DONTNEED.

To minimze the added overhead, I further suggest to only try reclaim asynchronously if we know that likely all ptes will be none, that is, when we just zapped *all* ptes of a PTE page table -- our range spans the complete PTE page table.

Just imagine someone zaps a single PTE, we really don't want to start scanning page tables and involve an (rather expensive) walk_page_range just to find out that there is still something mapped.

Last but not least, would there be a way to avoid the walk_page_range() and simply trigger it from zap_pte_range(), possibly still while holding the PTE table lock?

We might have to trylock the PMD, but that should be doable.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux