On Thu, Feb 29, 2024 at 5:54 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Thu, Feb 29, 2024 at 10:34 AM Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > > > HVO can be one of the perks for heavy THP users like it is for hugeTLB > > users. For example, if such a user uses 60% of physical memory for 2MB > > THPs, THP HVO can reduce the struct page overhead by half (60% * 7/8 > > ~= 50%). > > > > ZONE_NOMERGE considerably simplifies the implementation of HVO for > > THPs, since THPs from it cannot be split or merged and thus do not > > require any correctness-related operations on tail pages beyond the > > second one. > > > > If a THP is mapped by PTEs, two optimization-related operations on its > > tail pages, i.e., _mapcount and PG_anon_exclusive, can be binned to > > track a group of pages, e.g., eight pages per group for 2MB THPs. The > > estimation, as the copying cost incurred during shattering, is also by > > design, since mapping by PTEs is another discouraged behavior. > > I'm confused by this. Can you please elaborate a little bit about > binning mapcount and PG_anon_exclusive? > > For mapcount, IIUC, for example, when inc'ing a subpage's mapcount, > you actually inc the (i % 64) page's mapcount (assuming THP size is 2M > and base page size is 4K, so 8 strides and 64 pages in each stride), > right? Correct. > But how you can tell each page of the 8 pages has mapcount 1 or > one page is mapped 8 times? We can't :) > Or this actually doesn't matter, we don't > even care to distinguish the two cases? Exactly. > For PG_anon_exclusive, if one page has it set, it means other 7 pages > in other strides have it set too? Correct. We leverage the fact that they (_mapcount and PG_anon_exclusive) are optimizations, overestimating _mapcount and underestimating PG_anon_exclusive (both are for worst) can only affect the performance for PTE-mapped THPs (as a punishment for splitting).