On 5/18/22 18:30, Hugh Dickins wrote: > On Wed, 18 May 2022, Mike Kravetz wrote: > >> For most non-anonymous vmas, we do not copy page tables at fork time, but >> rather lazily populate the tables after fork via faults. The routine >> vma_needs_copy() is used to make this decision. For VM_HUGETLB vmas, it always >> returns true. > > "vma_needs_copy()" is *very* recent coinage, not reached Linus yet. > >> >> Anyone know/remember why? The code was added more than 15 years ago and >> my search for why hugetlb vmas were excluded came up empty. >> >> I do not see a reason why VM_HUGETLB is in this list. Initial testing did >> not reveal any problems when I removed the VM_HUGETLB check. >> >> FYI - I am looking at the performance of fork and exec (unmap) of processes >> with very large hugetlb mappings. Skipping the copy at fork time would >> certainly speed things up. Of course, there could some users who would >> notice if hugetlb page tables are not copied at fork time. However, this >> is the behavior for 'normal' mappings. I am inclined to make hugetlb be >> 'more normal'. > > Good question, not obvious to me either: but I've found the answer. Thank you Hugh! You went above and beyond as usual. > The commit was of course Nick's d992895ba2b2 ("[PATCH] Lazy page table > copies in fork()") in 2.6.14; but it doesn't explain why VM_HUGETLB is > there in the test, and goes on to be copied. > > I haven't re-read through the whole mail thread which led to that > commit, but I think you'll find the crucial observation comes from > Andi in https://lore.kernel.org/lkml/200508251756.07849.ak@xxxxxxx/#t Sorry, that I did not find the entire thread. There were only a couple pieces on linux-mm and that is the only place I looked. > > "Actually I disabled it for hugetlbfs (... !is_huge...vma). The reason > is that lazy faulting for huge pages is still not in mainline." > > and indeed, look at the 2.6.13 or 2.6.14 mm/hugetlb.c and you find > /* > * We cannot handle pagefaults against hugetlb pages at all. They cause > * handle_mm_fault() to try to instantiate regular-sized pages in the > * hugegpage VMA. do_page_fault() is supposed to trap this, so BUG is we get > * this far. > */ > static struct page *hugetlb_nopage(struct vm_area_struct *vma, > unsigned long address, int *unused) > { > BUG(); > return NULL; > } > > Oh, and that pretty much still exists to this day, to cover that path > to a fault; but 2.6.16 implemented hugetlb_no_page(), which is what > then actually got used to satisfy a hugetlb fault. > > So the reason for fork copying VM_HUGETLB appears to have gone away > in 2.6.16. Yes, that is the likely reason. Functionality was not originally supported, and when it was added this 'optimization' was not enabled. > (I haven't a clue on private hugetlb mappings and reservations and > whether anon_vma means the same on hugetlb, but you know all that.) Yes, I believe anon_vma means the same on hugetlb for this purpose. Although, I do need to look closer just to make sure there are not hidden surprises. Thanks again, -- Mike Kravetz