On Thu, Jun 06, 2024 at 12:30:44PM -0700, James Houghton wrote: > Today the VM_HUGETLB flag tells the fault handler to call into > hugetlb_fault() (there are many other special cases, but this one is > probably the most important). How should faults on VMAs without > VM_HUGETLB that map HugeTLB folios be handled? If you handle faults > with the main mm fault handler without getting rid of hugetlb_fault(), > I think you're basically implementing a second, more tmpfs-like > hugetlbfs... right? > > I don't really have anything against this approach, but I think the > decision was to reduce the number of special cases as much as we can > first before attempting to rewrite hugetlbfs. > > Or maybe I've got something wrong and what you're asking doesn't > logically end up at a hugetlbfs v2. Right, so we ignore hugetlb_fault() and call into __handle_mm_fault(). Once there, we'll do: vmf.pud = pud_alloc(mm, p4d, address); if (pud_none(*vmf.pud) && thp_vma_allowable_order(vma, vm_flags, TVA_IN_PF | TVA_ENFORCE_SYSFS, PUD_ORDER)) { ret = create_huge_pud(&vmf); which will call vma->vm_ops->huge_fault(vmf, PUD_ORDER); So all we need to do is implement huge_fault in hugetlb_vm_ops. I don't think that's the same as creating a hugetlbfs2 because it's just another entry point. You can mmap() the same file both ways and it's all cache coherent.