On Thu, Jun 6, 2024 at 11:29 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > One of the things we discussed at LSFMM was unifying the hugetlb and > THP page table walkers. I've been looking into it some more recently; > I've found a problem and I think a solution. > > The reason we have a separate hugetlb_entry from pmd_entry and pud_entry > is that it has a different locking context. It is called with the > hugetlb_vma_lock held for read (nb: this is not the same as the vma > lock; see walk_hugetlb_range()). Why do we need this? Because of page > table sharing. > > In a completely separate discussion, I was talking with Khalid about > mshare() support for hugetlbfs, and I suggested that we permit hugetlbfs > pages to be mapped by a VMA which does not have the VM_HUGETLB flag set. > If we do that, the page tables would not be permitted to be shared with > other users of that hugetlbfs file. But we want to eliminate support > for that anyway, so that's more of a feature than a bug. > > Once we don't use the VM_HUGETLB flag on these VMAs, that opens the > door to the other features we want, like mapping individual pages from > a hugetlb folio. And we can use the regular page table walkers for > these VMAs. > > Is this a reasonable path forward, or have I overlooked something? Hi Matthew, Today the VM_HUGETLB flag tells the fault handler to call into hugetlb_fault() (there are many other special cases, but this one is probably the most important). How should faults on VMAs without VM_HUGETLB that map HugeTLB folios be handled? If you handle faults with the main mm fault handler without getting rid of hugetlb_fault(), I think you're basically implementing a second, more tmpfs-like hugetlbfs... right? I don't really have anything against this approach, but I think the decision was to reduce the number of special cases as much as we can first before attempting to rewrite hugetlbfs. Or maybe I've got something wrong and what you're asking doesn't logically end up at a hugetlbfs v2.