On Fri, Jul 15, 2022 at 02:52:27PM -0700, Axel Rasmussen wrote: > Guest access in terms of "physical" memory address is basically > random. So, actually filling in all 262k 4K PTEs making up a > contiguous 1G region might take quite some time. Once we've completed > any of the various 2M contiguous regions, it would be nice to go ahead > and collapse those right away. The benefit is, the guest will see some > performance benefit from the 2G page already, without having to wait > for the full 1G page to complete. Once we do complete a 1G page, it > would be nice to collapse that one level further. If we do this, the > whole guest memory will be a mix of 1G, 2M, and 4K. Just to mention that we've got quite some other things that drags perf down much more than tlb hits on page sizes during any VM migration process. For example, when we split & wr-protect pages during the starting phase of migration on src host, it's not about 10% or 20% drop but much drastic. In the postcopy case it's for dest but still it's part of the whole migration process and probably guest-aware too. If the guest wants, it can simply start writting some pages continuously and it'll see obvious drag downs any time during migration I bet. It'll always be nice to have multi-level sub-mappings and I fully agree. IMHO it's a matter of whether keeping 4k-only would greatly simplify the work, especially on the rework of hugetlb sub-mage aware pgtable ops. Thanks, -- Peter Xu