Re: [Chapter Three] THP HVO: bring the hugeTLB feature to THP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 29.02.24 23:54, Yang Shi wrote:
On Thu, Feb 29, 2024 at 10:34 AM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:

HVO can be one of the perks for heavy THP users like it is for hugeTLB
users. For example, if such a user uses 60% of physical memory for 2MB
THPs, THP HVO can reduce the struct page overhead by half (60% * 7/8
~= 50%).

ZONE_NOMERGE considerably simplifies the implementation of HVO for
THPs, since THPs from it cannot be split or merged and thus do not
require any correctness-related operations on tail pages beyond the
second one.

If a THP is mapped by PTEs, two optimization-related operations on its
tail pages, i.e., _mapcount and PG_anon_exclusive, can be binned to
track a group of pages, e.g., eight pages per group for 2MB THPs. The
estimation, as the copying cost incurred during shattering, is also by
design, since mapping by PTEs is another discouraged behavior.

I'm confused by this. Can you please elaborate a little bit about
binning mapcount and PG_anon_exclusive?

For mapcount, IIUC, for example, when inc'ing a subpage's mapcount,
you actually inc the (i % 64) page's mapcount (assuming THP size is 2M
and base page size is 4K, so 8 strides and 64 pages in each stride),
right? But how you can tell each page of the 8 pages has mapcount 1 or
one page is mapped 8 times? Or this actually doesn't matter, we don't
even care to distinguish the two cases?

I'm hoping we won't need such elaborate approaches that make the mapcounts even more complicated in the future.

Just like for hugetlb HGM (if it ever becomes real), I'm hoping that we can just avoid subpage mapcounts completely, at least in some kernel configs initially.

I was looking into having only a single PAE bit this week, but migration+swapout are (again) giving me a really hard time. In theory it's simple, the corner cases are killing me.

What I really dislike about PAE right now is not necessarily the space, but that they reside in multiple cachelines and that we have to use atomic operations to set/clear them simply because other page flags might be set concurrently. PAE can only be set/cleared while holding the page table lock already, so I really want to avoid atomics.

I have not given up on a single PAE bit per folio, but the alternative I was thinking about this week was simply allocating the space required for maintaining them and storing a pointer to that in the (anon) folio. Not perfect.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux