Re: Unifying page table walkers

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 6 Jun 2024 22:21:13 +0100

On Thu, Jun 06, 2024 at 01:23:08PM -0700, James Houghton wrote:
> On Thu, Jun 6, 2024 at 1:04 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > Right, so we ignore hugetlb_fault() and call into __handle_mm_fault().
> > Once there, we'll do:
> >
> >         vmf.pud = pud_alloc(mm, p4d, address);
> >         if (pud_none(*vmf.pud) &&
> >             thp_vma_allowable_order(vma, vm_flags,
> >                                 TVA_IN_PF | TVA_ENFORCE_SYSFS, PUD_ORDER)) {
> >                 ret = create_huge_pud(&vmf);
> >
> > which will call vma->vm_ops->huge_fault(vmf, PUD_ORDER);
> >
> > So all we need to do is implement huge_fault in hugetlb_vm_ops.  I
> > don't think that's the same as creating a hugetlbfs2 because it's just
> > another entry point.  You can mmap() the same file both ways and it's
> > all cache coherent.
> 
> That makes a lot of sense. FWIW, this sounds good to me (though I'm
> curious what Peter thinks :)).
> 
> But I think you'll need to be careful to ensure that, for now anyway,
> huge_fault() is always called with the exact same ptep/pmdp/pudp that
> hugetlb_walk() would have returned (ignoring sharing). If you allow
> PMD mapping of what would otherwise be PUD-mapped hugetlb pages right
> now, you'll break the vmemmap optimization (and probably other
> things).

Why is that?  This sounds like you know something I don't ;-)
Is it the mapcount issue?

> Also I'm not sure how this will interact with arm64's hugetlb pages
> implemented with contiguous PTEs/PMDs. You might have to round
> `address` down to make sure you've picked the first PTE/PMD in the
> group.

I hadn't thought about the sub-PMD size hugetlb issue either.  We can
certainly limit the support to require alignment to the appropriate
size.