I hope to use these 3 patches to start a discussion on eventually removing the need to pass a struct vma pointer when taking a folio from the global pool (i.e. dequeue_hugetlb_folio_vma()). Why eliminate passing the struct vma pointer? VMAs are more related to mapping into userspace, and it would be cleaner if the HugeTLB folio allocation process could just focus on returning a folio. Currently, the vma struct is a convenient struct that holds pieces of information required in the allocation process, but dequeuing should not depend on the VMA concept. If the vma is needed deep in the allocation process, then allocation could become awkward, such as in HugeTLBfs's fallocate, where there is no vma (yet) and a pseudo-vma has to be created. Separation will help with HugeTLB unification. Taking reference from the buddy allocator, __alloc_pages_noprof() is conceptually separate from VMAs. I started looking into this because we want to use HugeTLB folios in guest_memfd [1], and then I found that the HugeTLB folio allocation process is tightly coupled with VMAs. This makes it hard to use HugeTLB folios in guest_memfd, which does not have VMAs for private pages. Then, I watched Peter Xu's talk at LSFMM [2] about HugeTLB unifications and thought that these patches could also contribute to the unification effort. As discussed at LPC 2024 [3], the general preference is for guest_memfd to use HugeTLB folios. While that is being worked out, I hope these patches can be separately considered and merged. I believe the patches are still useful in improving understandability of the resv_map/subpool/hstate reservation system in HugeTLB, and there are no functionality changes intended. --- Why use HugeTLB folios in guest_memfd? HugeTLB is *the* source of 1G pages in the kernel today and it would be best for all 1G page users (HugeTLB, HugeTLBfs, or guest_memfd) on a host to draw from the same pool of 1G pages. This allows central tracking of all 1G pages, a precious resource on a machine. Having a separate 1G page allocator would not only require rebuilding of features that HugeTLB has, but also cause a split 1G pool. If both allocators are used on a machine, it would be complicated to (a) predetermine how many pages to put in each allocator's pool or (b) transfer pages between the pools at runtime. --- [1] https://lore.kernel.org/all/cover.1726009989.git.ackerleytng@xxxxxxxxxx/T/ [2] https://youtu.be/7k-m2gTDu2k?si=ghWZ6qa1GAdaHOFP [3] https://youtu.be/PVTjLLEpozE?si=HvdDlUc_4ElVXu5R Ackerley Tng (3): mm: hugetlb: Simplify logic in dequeue_hugetlb_folio_vma() mm: hugetlb: Refactor vma_has_reserves() to should_use_hstate_resv() mm: hugetlb: Remove unnecessary check for avoid_reserve mm/hugetlb.c | 57 +++++++++++++++++++++------------------------------- 1 file changed, 23 insertions(+), 34 deletions(-) -- 2.47.0.rc1.288.g06298d1525-goog