Re: Unifying page table walkers

James Houghton <jthoughton@xxxxxxxxxx> · Thu, 6 Jun 2024 13:23:08 -0700

On Thu, Jun 6, 2024 at 1:04 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> Right, so we ignore hugetlb_fault() and call into __handle_mm_fault().
> Once there, we'll do:
>
>         vmf.pud = pud_alloc(mm, p4d, address);
>         if (pud_none(*vmf.pud) &&
>             thp_vma_allowable_order(vma, vm_flags,
>                                 TVA_IN_PF | TVA_ENFORCE_SYSFS, PUD_ORDER)) {
>                 ret = create_huge_pud(&vmf);
>
> which will call vma->vm_ops->huge_fault(vmf, PUD_ORDER);
>
> So all we need to do is implement huge_fault in hugetlb_vm_ops.  I
> don't think that's the same as creating a hugetlbfs2 because it's just
> another entry point.  You can mmap() the same file both ways and it's
> all cache coherent.

That makes a lot of sense. FWIW, this sounds good to me (though I'm
curious what Peter thinks :)).

But I think you'll need to be careful to ensure that, for now anyway,
huge_fault() is always called with the exact same ptep/pmdp/pudp that
hugetlb_walk() would have returned (ignoring sharing). If you allow
PMD mapping of what would otherwise be PUD-mapped hugetlb pages right
now, you'll break the vmemmap optimization (and probably other
things).

Also I'm not sure how this will interact with arm64's hugetlb pages
implemented with contiguous PTEs/PMDs. You might have to round
`address` down to make sure you've picked the first PTE/PMD in the
group.