On 29.07.22 03:40, Peter Xu wrote: > [Marking as RFC; only x86 is supported for now, plan to add a few more > archs when there's a formal version] > > Problem > ======= > > When migrate a page, right now we always mark the migrated page as old. > The reason could be that we don't really know whether the page is hot or > cold, so we could have taken it a default negative assuming that's safer. > > However that could lead to at least two problems: > > (1) We lost the real hot/cold information while we could have persisted. > That information shouldn't change even if the backing page is changed > after the migration, > > (2) There can be always extra overhead on the immediate next access to > any migrated page, because hardware MMU needs cycles to set the young > bit again (as long as the MMU supports). > > Many of the recent upstream works showed that (2) is not something trivial > and actually very measurable. In my test case, reading 1G chunk of memory > - jumping in page size intervals - could take 99ms just because of the > extra setting on the young bit on a generic x86_64 system, comparing to 4ms > if young set. > > This issue is originally reported by Andrea Arcangeli. > > Solution > ======== > > To solve this problem, this patchset tries to remember the young bit in the > migration entries and carry it over when recovering the ptes. > > We have the chance to do so because in many systems the swap offset is not > really fully used. Migration entries use swp offset to store PFN only, > while the PFN is normally not as large as swp offset and normally smaller. > It means we do have some free bits in swp offset that we can use to store > things like young, and that's how this series tried to approach this > problem. > > One tricky thing here is even though we're embedding the information into > swap entry which seems to be a very generic data structure, the number of > bits that are free is still arch dependent. Not only because the size of > swp_entry_t differs, but also due to the different layouts of swap ptes on > different archs. > > Here, this series requires specific arch to define an extra macro called > __ARCH_SWP_OFFSET_BITS represents the size of swp offset. With this > information, the swap logic can know whether there's extra bits to use, > then it'll remember the young bits when possible. By default, it'll keep > the old behavior of keeping all migrated pages cold. > I played with a similar idea when working on pte_swp_exclusive() but gave up, because it ended up looking too hacky. Looking at patch #2, I get the same feeling again. Kind of hacky. If we mostly only care about x86_64, and it's a performance improvement after all, why not simply do it like pte_swp_mkexclusive/pte_swp_exclusive/ ... and reuse a spare PTE bit? -- Thanks, David / dhildenb