> Is the primary concern the locking? If so, I am not sure we have an issue. > As mentioned in your commit message, current code will use > pte_offset_map_lock(). pte_offset_map_lock uses pte_lockptr, and pte_lockptr > will either be the mm wide lock or pmd_page lock. To me, it seems that > either would provide correct synchronization for CONT-PTE entries. Am I > missing something or misreading the code? > > I started looking at code cleanup suggested by David. Here is a quick > patch (not tested and likely containing errors) to see if this is a step > in the right direction. > > I like it because we get rid of/combine all those follow_huge_p*d > routines. > Yes, see comments below. > From 35d117a707c1567ddf350554298697d40eace0d7 Mon Sep 17 00:00:00 2001 > From: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > Date: Wed, 24 Aug 2022 15:59:15 -0700 > Subject: [PATCH] hugetlb: call hugetlb_follow_page_mask for hugetlb pages in > follow_page_mask > > At the beginning of follow_page_mask, there currently is a call to > follow_huge_addr which 'may' handle hugetlb pages. ia64 is the only > architecture which (incorrectly) provides a follow_huge_addr routine > that does not return error. Instead, at each level of the page table a > check is made for a hugetlb entry. If a hugetlb entry is found, a call > to a routine associated with that page table level such as > follow_huge_pmd is made. > > All the follow_huge_p*d routines are basically the same. In addition > huge page size can be derived from the vma, so we know where in the page > table a huge page would reside. So, replace follow_huge_addr with a > new architecture independent routine which will provide the same > functionality as the follow_huge_p*d routines. We can then eliminate > the p*d_huge checks in follow_page_mask page table walking as well as > the follow_huge_p*d routines themselves.> > follow_page_mask still has is_hugepd hugetlb checks during page table > walking. This is due to these checks and follow_huge_pd being > architecture specific. These can be eliminated if > hugetlb_follow_page_mask can be overwritten by architectures (powerpc) > that need to do follow_huge_pd processing. But won't the > + /* hugetlb is special */ > + if (is_vm_hugetlb_page(vma)) > + return hugetlb_follow_page_mask(vma, address, flags); code route everything via hugetlb_follow_page_mask() and all these (beloved) hugepd checks would essentially be unreachable? At least my understanding is that hugepd only applies to hugetlb. Can't we move the hugepd handling code into hugetlb_follow_page_mask() as well? I mean, doesn't follow_hugetlb_page() also have to handle that hugepd stuff already ... ? [...] > > +struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, > + unsigned long address, unsigned int flags) > +{ > + struct hstate *h = hstate_vma(vma); > + struct mm_struct *mm = vma->vm_mm; > + unsigned long haddr = address & huge_page_mask(h); > + struct page *page = NULL; > + spinlock_t *ptl; > + pte_t *pte, entry; > + > + /* > + * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via > + * follow_hugetlb_page(). > + */ > + if (WARN_ON_ONCE(flags & FOLL_PIN)) > + return NULL; > + > + pte = huge_pte_offset(mm, haddr, huge_page_size(h)); > + if (!pte) > + return NULL; > + > +retry: > + ptl = huge_pte_lock(h, mm, pte); > + entry = huge_ptep_get(pte); > + if (pte_present(entry)) { > + page = pte_page(entry); > + /* > + * try_grab_page() should always succeed here, because we hold > + * the ptl lock and have verified pte_present(). > + */ > + if (WARN_ON_ONCE(!try_grab_page(page, flags))) { > + page = NULL; > + goto out; > + } > + } else { > + if (is_hugetlb_entry_migration(entry)) { > + spin_unlock(ptl); > + __migration_entry_wait_huge(pte, ptl); > + goto retry; > + } > + /* > + * hwpoisoned entry is treated as no_page_table in > + * follow_page_mask(). > + */ > + } > +out: > + spin_unlock(ptl); > + return page; This is neat and clean enough to not reuse follow_hugetlb_page(). I wonder if we want to add some comment to the function how this differs to follow_hugetlb_page(). ... or do we maybe want to rename follow_hugetlb_page() to someting like __hugetlb_get_user_pages() to make it clearer in which context it will get called? I guess it might be feasible in the future to eliminate follow_hugetlb_page() and centralizing the faulting code. For now, this certainly improves the situation. -- Thanks, David / dhildenb