On Tue, 14 Mar 2023, James Houghton wrote: > On Mon, Mar 6, 2023 at 11:19 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > > > This is past the deadline, so feel free to ignore. However, ... > > > > James Houghton has been working on the concept of HugeTLB High Granularity > > Mapping (HGM) as discussed here: > > https://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@xxxxxxxxxx/ > > > > The primary motivation for this work is post-copy live migration of VMs backed > > by hugetlb pages via userfaultfd. A followup use case is more gracefully > > handling memory errors/poison on hugetlb pages. > > > > As can be seen by the size of James's patch set, the required changes for > > HGM are a bit complex and involved. This is also complicated the need > > choosing a 'mapcount strategy' as the previous scheme used by hugetlb > > will no longer work. > > > > A HGM for hugetlbfs session would present the current approach and challenges. > > While much of the work is confined to hugetlb, there is a bit spill over to > > other mm areas: specifically page table walking. A discussion on ways to > > move forward with this effort would be appreciated. > > Thanks for proposing this, Mike. > > To hopefully get more interest in this topic, I want to lay out the > reasons that Google uses HugeTLB for VMs today. They are: > - Guaranteed availability of hugepages > - Guaranteed NUMA alignment > - Availability of 1G pages > - HugeTLB vmemmap optimization to save page struct overhead > > Until generic mm supports all this, HugeTLB will remain a very > important piece of Linux for us. :) > > The main limitation of HugeTLB that I care about is that it can only > map an entire hugepage at once; it can never partially map a hugepage > (like, there is no such thing as a PTE-mapped HugeTLB page). As Mike > said, this makes the following applications impossible: > 1. With userfaultfd-based live migration, being able to fetch and > install memory at PAGE_SIZE. > 2. Memory poison at PAGE_SIZE. > > HugeTLB high-granularity mapping (HGM) is an effort to make #1 and #2 > possible with HugeTLB. > > #1 and #2 are already possible with generic mm, so this also begs the > question: Can we merge HugeTLB with generic mm? This would certainly > be much more work than HGM, but it removes all those pesky HugeTLB > special cases (though, we still want all those features that HugeTLB > has). > > Coming up with a plan to merge HugeTLB with generic mm would be > challenging, and LSFMM might be a good place to have such a > discussion. Not all of HugeTLB would need to be merged. I think some > of the main special cases that should be removed are: > 1. hugetlb_fault (fault/GUP special case) > 2. page_vma_mapped_walk's special case > 3. hugetlb_entry in pagewalk > 4. HugeTLB's rmap/mapcount special cases (already working on this!) > > As part of this merge/unification, architectures would need to merge > their hugetlb implementations with their generic mm implementations > (for example, moving any special logic from set_huge_pte_at to > set_pte_at). > > These are just some initial thoughts; I'm sure many of you have your > own ideas for this. > > A discussion about HGM might serve as a jumping-off point for ideas > for how to enhance the generic mm implementation to make the > unification possible. > I'd definitely be interested in joining into this discussion, specifically for live migration and memory poisoning use cases. Adding in some folks at AMD as well as this may be useful for SEV-SNP host support.