Re: [LSF/MM/BPF TOPIC] HGM for hugetlbfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 6, 2023 at 11:19 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:
>
> This is past the deadline, so feel free to ignore.  However, ...
>
> James Houghton has been working on the concept of HugeTLB High Granularity
> Mapping (HGM) as discussed here:
> https://lore.kernel.org/linux-mm/20230218002819.1486479-1-jthoughton@xxxxxxxxxx/
>
> The primary motivation for this work is post-copy live migration of VMs backed
> by hugetlb pages via userfaultfd.  A followup use case is more gracefully
> handling memory errors/poison on hugetlb pages.
>
> As can be seen by the size of James's patch set, the required changes for
> HGM are a bit complex and involved.  This is also complicated the need
> choosing a 'mapcount strategy' as the previous scheme used by hugetlb
> will no longer work.
>
> A HGM for hugetlbfs session would present the current approach and challenges.
> While much of the work is confined to hugetlb, there is a bit spill over to
> other mm areas: specifically page table walking.  A discussion on ways to
> move forward with this effort would be appreciated.

Thanks for proposing this, Mike.

To hopefully get more interest in this topic, I want to lay out the
reasons that Google uses HugeTLB for VMs today. They are:
- Guaranteed availability of hugepages
- Guaranteed NUMA alignment
- Availability of 1G pages
- HugeTLB vmemmap optimization to save page struct overhead

Until generic mm supports all this, HugeTLB will remain a very
important piece of Linux for us. :)

The main limitation of HugeTLB that I care about is that it can only
map an entire hugepage at once; it can never partially map a hugepage
(like, there is no such thing as a PTE-mapped HugeTLB page). As Mike
said, this makes the following applications impossible:
1. With userfaultfd-based live migration, being able to fetch and
install memory at PAGE_SIZE.
2. Memory poison at PAGE_SIZE.

HugeTLB high-granularity mapping (HGM) is an effort to make #1 and #2
possible with HugeTLB.

#1 and #2 are already possible with generic mm, so this also begs the
question: Can we merge HugeTLB with generic mm? This would certainly
be much more work than HGM, but it removes all those pesky HugeTLB
special cases (though, we still want all those features that HugeTLB
has).

Coming up with a plan to merge HugeTLB with generic mm would be
challenging, and LSFMM might be a good place to have such a
discussion. Not all of HugeTLB would need to be merged. I think some
of the main special cases that should be removed are:
1. hugetlb_fault (fault/GUP special case)
2. page_vma_mapped_walk's special case
3. hugetlb_entry in pagewalk
4. HugeTLB's rmap/mapcount special cases (already working on this!)

As part of this merge/unification, architectures would need to merge
their hugetlb implementations with their generic mm implementations
(for example, moving any special logic from set_huge_pte_at to
set_pte_at).

These are just some initial thoughts; I'm sure many of you have your
own ideas for this.

A discussion about HGM might serve as a jumping-off point for ideas
for how to enhance the generic mm implementation to make the
unification possible.


- James Houghton





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux