Re: [RFC PATCH 0/4] Multiple consecutive page for anonymous mapping

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09.01.23 08:22, Yin Fengwei wrote:
In a nutshell:  4k is too small and 2M is too big.  We started
asking ourselves whether there was something in the middle that
we could do.  This series shows what that middle ground might
look like.  It provides some of the benefits of THP while
eliminating some of the downsides.

This series uses "multiple consecutive pages" (mcpages) of
between 8K and 2M of base pages for anonymous user space mappings.
This will lead to less internal fragmentation versus 2M mappings
and thus less memory consumption and wasted CPU time zeroing
memory which will never be used.

Hi,

what I understand is that this is some form of faultaround for anonymous memory, with the special-case that we try to allocate the pages consecutively.

Some thoughts:

(1) Faultaround might be unexpected for some workloads and increase
    memory consumption unnecessarily.

Yes, something like that can happen with THP BUT

(a) THP can be disabled or is frequently only enabled for madvised
    regions -- for example, exactly for this reason.
(b) Some workloads (especially memory ballooning) rely on memory not
    suddenly re-appearing after MADV_DONTNEED. This works even with THP,
    because the 4k MADV_DONTNEED will first PTE-map the THP. Because
    there is a PTE page table, we won't suddenly get a THP populated
    again (unless khugepaged is configured to fill holes).


I strongly assume we will need something similar to force-disable, selectively-enable etc.


(2) This steals consecutive pages to immediately split them up

I know, everybody thinks it might be valuable for their use case to grab all higher-order pages :) It will be "fun" once all these cases start competing. TBH, splitting up them immediately again smells like being the lowest priority among all higher-order users.


(3) All effort will be lost once page compaction gets active, compacts,
    and simply migrates to random 4k pages. This is most probably the
    biggest "issue" of the whole approach AFAIKS: it's only temporary
    because there is no notion of these pages belonging together
    anymore.


In the implementation, we allocate high order page with order of
mcpage (e.g., order 2 for 16KB mcpage). This makes sure the
physical contiguous memory is used and benefit sequential memory
access latency.

Then split the high order page. By doing this, the sub-page of
mcpage is just 4K normal page. The current kernel page
management is applied to "mc" pages without any changes. Batching
page faults is allowed with mcpage and reduce page faults number.

There are costs with mcpage. Besides no TLB benefit THP brings, it
increases memory consumption and latency of allocation page
comparing to 4K base page.

This series is the first step of mcpage. The furture work can be
enable mcpage for more components like page cache, swapping etc.
Finally, most pages in system will be allocated/free/reclaimed
with mcpage order.

I think avoiding new, herd-to-get terminology ("mcpage") might be better. I know, everybody wants to give its child a name, but the name us not really future proof: "multiple consecutive pages" might at one point be maybe just a folio.

I'd summarize the ideas as "faultaround" whereby we try optimizing for locality.

Note that a similar (but different) concept already exists (hidden) for hugetlb e.g., on arm64. The feature is called "cont-pte" -- a sequence of PTEs that logically map a hugetlb page.

--
Thanks,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux