On 08/25/22 09:25, David Hildenbrand wrote: > > Is the primary concern the locking? If so, I am not sure we have an issue. > > As mentioned in your commit message, current code will use > > pte_offset_map_lock(). pte_offset_map_lock uses pte_lockptr, and pte_lockptr > > will either be the mm wide lock or pmd_page lock. To me, it seems that > > either would provide correct synchronization for CONT-PTE entries. Am I > > missing something or misreading the code? > > > > I started looking at code cleanup suggested by David. Here is a quick > > patch (not tested and likely containing errors) to see if this is a step > > in the right direction. > > > > I like it because we get rid of/combine all those follow_huge_p*d > > routines. > > > > Yes, see comments below. > > > From 35d117a707c1567ddf350554298697d40eace0d7 Mon Sep 17 00:00:00 2001 > > From: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > > Date: Wed, 24 Aug 2022 15:59:15 -0700 > > Subject: [PATCH] hugetlb: call hugetlb_follow_page_mask for hugetlb pages in > > follow_page_mask > > > > At the beginning of follow_page_mask, there currently is a call to > > follow_huge_addr which 'may' handle hugetlb pages. ia64 is the only > > architecture which (incorrectly) provides a follow_huge_addr routine > > that does not return error. Instead, at each level of the page table a > > check is made for a hugetlb entry. If a hugetlb entry is found, a call > > to a routine associated with that page table level such as > > follow_huge_pmd is made. > > > > All the follow_huge_p*d routines are basically the same. In addition > > huge page size can be derived from the vma, so we know where in the page > > table a huge page would reside. So, replace follow_huge_addr with a > > new architecture independent routine which will provide the same > > functionality as the follow_huge_p*d routines. We can then eliminate > > the p*d_huge checks in follow_page_mask page table walking as well as > > the follow_huge_p*d routines themselves.> > > follow_page_mask still has is_hugepd hugetlb checks during page table > > walking. This is due to these checks and follow_huge_pd being > > architecture specific. These can be eliminated if > > hugetlb_follow_page_mask can be overwritten by architectures (powerpc) > > that need to do follow_huge_pd processing. > > But won't the > > > + /* hugetlb is special */ > > + if (is_vm_hugetlb_page(vma)) > > + return hugetlb_follow_page_mask(vma, address, flags); > > code route everything via hugetlb_follow_page_mask() and all these > (beloved) hugepd checks would essentially be unreachable? > > At least my understanding is that hugepd only applies to hugetlb. > > Can't we move the hugepd handling code into hugetlb_follow_page_mask() > as well? > > I mean, doesn't follow_hugetlb_page() also have to handle that hugepd > stuff already ... ? > > [...] I think so, but I got a little confused looking at the hugepd handling code. Adding Aneesh who added support to follow_page_mask in the series at: https://lore.kernel.org/linux-mm/1494926612-23928-1-git-send-email-aneesh.kumar@xxxxxxxxxxxxxxxxxx/ I believe you are correct in that follow_hugetlb_page must handle as well. One source of my confusion is the following in follow_huge_pd: /* * hugepage directory entries are protected by mm->page_table_lock * Use this instead of huge_pte_lockptr */ ptl = &mm->page_table_lock; spin_lock(ptl); Yet, if follow_hugetlb_page handles hugepd then it is using huge_pte_lockptr to get the lock pointer and is wrong? Hoping Aneesh can help clear up the confusion. BTW, I also noticed that the above series added the comment: /* make this handle hugepd */ above the call to follow_huge_addr() in follow_page_mask. Perhaps there was at one time a plan to have follow_huge_addr handle hugepd? That series removed powerpc specific follow_huge_addr routine. -- Mike Kravetz