Re: [RFC PATCH 07/26] hugetlb: add hugetlb_pte to track HugeTLB page table entries

James Houghton <jthoughton@xxxxxxxxxx> · Thu, 8 Sep 2022 10:54:37 -0700

On Thu, Sep 8, 2022 at 10:38 AM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> James,
>
> On Fri, Jun 24, 2022 at 05:36:37PM +0000, James Houghton wrote:
> > +static inline
> > +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> > +{
> > +
> > +     BUG_ON(!hpte->ptep);
> > +     // Only use huge_pte_lockptr if we are at leaf-level. Otherwise use
> > +     // the regular page table lock.
> > +     if (hugetlb_pte_none(hpte) || hugetlb_pte_present_leaf(hpte))
> > +             return huge_pte_lockptr(hugetlb_pte_shift(hpte),
> > +                             mm, hpte->ptep);
> > +     return &mm->page_table_lock;
> > +}
>
> Today when I re-read part of this thread, I found that I'm not sure whether
> this is safe.  IIUC taking different locks depending on the state of pte
> may lead to issues.
>
> For example, could below race happen where two threads can be taking
> different locks even if stumbled over the same pmd entry?
>
>          thread 1                          thread 2
>          --------                          --------
>
>     hugetlb_pte_lockptr (for pmd level)
>       pte_none()==true,
>         take pmd lock
>     pmd_alloc()
>                                 hugetlb_pte_lockptr (for pmd level)
>                                   pte is pgtable entry (so !none, !present_leaf)
>                                     take page_table_lock
>                                 (can run concurrently with thread 1...)
>     pte_alloc()
>     ...

Thanks for pointing out this race. Yes, it is wrong to change which
lock we take depending on the value of the PTE, as we would need to
lock the PTE first to correctly make the decision. This has already
been fixed in the next version of this series :). That is, we choose
which lock to grab based on the PTE's page table level.

- James

>
> --
> Peter Xu
>