Re: [PATCH v2 1/5] mm/hugetlb: fix races when looking up a CONT-PTE size hugetlb page

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Thu, 25 Aug 2022 14:13:32 -0700

On 08/25/22 09:25, David Hildenbrand wrote:
> > Is the primary concern the locking?  If so, I am not sure we have an issue.
> > As mentioned in your commit message, current code will use
> > pte_offset_map_lock().  pte_offset_map_lock uses pte_lockptr, and pte_lockptr
> > will either be the mm wide lock or pmd_page lock.  To me, it seems that
> > either would provide correct synchronization for CONT-PTE entries.  Am I
> > missing something or misreading the code?
> > 
> > I started looking at code cleanup suggested by David.  Here is a quick
> > patch (not tested and likely containing errors) to see if this is a step
> > in the right direction.
> > 
> > I like it because we get rid of/combine all those follow_huge_p*d
> > routines.
> > 
> 
> Yes, see comments below.
> 
> > From 35d117a707c1567ddf350554298697d40eace0d7 Mon Sep 17 00:00:00 2001
> > From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> > Date: Wed, 24 Aug 2022 15:59:15 -0700
> > Subject: [PATCH] hugetlb: call hugetlb_follow_page_mask for hugetlb pages in
> >  follow_page_mask
> > 
> > At the beginning of follow_page_mask, there currently is a call to
> > follow_huge_addr which 'may' handle hugetlb pages.  ia64 is the only
> > architecture which (incorrectly) provides a follow_huge_addr routine
> > that does not return error.  Instead, at each level of the page table a
> > check is made for a hugetlb entry.  If a hugetlb entry is found, a call
> > to a routine associated with that page table level such as
> > follow_huge_pmd is made.
> > 
> > All the follow_huge_p*d routines are basically the same.  In addition
> > huge page size can be derived from the vma, so we know where in the page
> > table a huge page would reside.  So, replace follow_huge_addr with a
> > new architecture independent routine which will provide the same
> > functionality as the follow_huge_p*d routines.  We can then eliminate
> > the p*d_huge checks in follow_page_mask page table walking as well as
> > the follow_huge_p*d routines themselves.>
> > follow_page_mask still has is_hugepd hugetlb checks during page table
> > walking.  This is due to these checks and follow_huge_pd being
> > architecture specific.  These can be eliminated if
> > hugetlb_follow_page_mask can be overwritten by architectures (powerpc)
> > that need to do follow_huge_pd processing.
> 
> But won't the
> 
> > +	/* hugetlb is special */
> > +	if (is_vm_hugetlb_page(vma))
> > +		return hugetlb_follow_page_mask(vma, address, flags);
> 
> code route everything via hugetlb_follow_page_mask() and all these
> (beloved) hugepd checks would essentially be unreachable?
> 
> At least my understanding is that hugepd only applies to hugetlb.
> 
> Can't we move the hugepd handling code into hugetlb_follow_page_mask()
> as well?
> 
> I mean, doesn't follow_hugetlb_page() also have to handle that hugepd
> stuff already ... ?
> 
> [...]

I think so, but I got a little confused looking at the hugepd handling code.
Adding Aneesh who added support to follow_page_mask in the series at:
https://lore.kernel.org/linux-mm/1494926612-23928-1-git-send-email-aneesh.kumar@xxxxxxxxxxxxxxxxxx/

I believe you are correct in that follow_hugetlb_page must handle as well.

One source of my confusion is the following in follow_huge_pd:

	/*
	 * hugepage directory entries are protected by mm->page_table_lock
	 * Use this instead of huge_pte_lockptr
	 */
	ptl = &mm->page_table_lock;
	spin_lock(ptl);

Yet, if follow_hugetlb_page handles hugepd then it is using huge_pte_lockptr
to get the lock pointer and is wrong?

Hoping Aneesh can help clear up the confusion.

BTW, I also noticed that the above series added the comment:
	/* make this handle hugepd */
above the call to follow_huge_addr() in follow_page_mask.  Perhaps there
was at one time a plan to have follow_huge_addr handle hugepd?  That
series removed powerpc specific follow_huge_addr routine.

-- 
Mike Kravetz