On Thu, Jun 06, 2024 at 01:23:08PM -0700, James Houghton wrote: > On Thu, Jun 6, 2024 at 1:04 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > Right, so we ignore hugetlb_fault() and call into __handle_mm_fault(). > > Once there, we'll do: > > > > vmf.pud = pud_alloc(mm, p4d, address); > > if (pud_none(*vmf.pud) && > > thp_vma_allowable_order(vma, vm_flags, > > TVA_IN_PF | TVA_ENFORCE_SYSFS, PUD_ORDER)) { > > ret = create_huge_pud(&vmf); > > > > which will call vma->vm_ops->huge_fault(vmf, PUD_ORDER); > > > > So all we need to do is implement huge_fault in hugetlb_vm_ops. I > > don't think that's the same as creating a hugetlbfs2 because it's just > > another entry point. You can mmap() the same file both ways and it's > > all cache coherent. > > That makes a lot of sense. FWIW, this sounds good to me (though I'm > curious what Peter thinks :)). > > But I think you'll need to be careful to ensure that, for now anyway, > huge_fault() is always called with the exact same ptep/pmdp/pudp that > hugetlb_walk() would have returned (ignoring sharing). If you allow > PMD mapping of what would otherwise be PUD-mapped hugetlb pages right > now, you'll break the vmemmap optimization (and probably other > things). Why is that? This sounds like you know something I don't ;-) Is it the mapcount issue? > Also I'm not sure how this will interact with arm64's hugetlb pages > implemented with contiguous PTEs/PMDs. You might have to round > `address` down to make sure you've picked the first PTE/PMD in the > group. I hadn't thought about the sub-PMD size hugetlb issue either. We can certainly limit the support to require alignment to the appropriate size.