On Tue, Dec 06, 2022 at 06:38:54PM -0800, John Hubbard wrote: > On 12/6/22 16:07, Peter Xu wrote: > > I thought I answered this one at [1] above. If not, I can extend the > > answer. > > [1] explains it, but it doesn't mention why it's safe to drop and reacquire. > > ... > > > > If we touch it, it's a potential bug as you mentioned. But we didn't. > > > > Hope it explains. > > I think it's OK after all, because hmm_vma_fault() does revalidate after > it takes the vma lock, so that closes the loop that I was fretting over. > > I was just also worried that I'd missed some other place, but it looks > like that's not the case. > > So, good. > > How about this incremental diff on top, as an attempt to clarify what's > going on? Or is this too much wordage? Sometimes I write too many words: Nop, that all looks good, thanks. I'll apply them in my new post. > > > diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h > index 1f7c2011f6cb..27a6df448ee5 100644 > --- a/include/linux/pagewalk.h > +++ b/include/linux/pagewalk.h > @@ -21,13 +21,16 @@ struct mm_walk; > * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. > * Any folded depths (where PTRS_PER_P?D is equal to 1) > * are skipped. > - * @hugetlb_entry: if set, called for each hugetlb entry. Note that > - * currently the hook function is protected by hugetlb > - * vma lock to make sure pte_t* and the spinlock is valid > - * to access. If the hook function needs to yield the > - * thread or retake the vma lock for some reason, it > - * needs to properly release the vma lock manually, > - * and retake it before the function returns. > + * @hugetlb_entry: if set, called for each hugetlb entry. This hook > + * function is called with the vma lock held, in order to > + * protect against a concurrent freeing of the pte_t* or > + * the ptl. In some cases, the hook function needs to drop > + * and retake the vma lock in order to avoid deadlocks > + * while calling other functions. In such cases the hook > + * function must either refrain from accessing the pte or > + * ptl after dropping the vma lock, or else revalidate > + * those items after re-acquiring the vma lock and before > + * accessing them. > * @test_walk: caller specific callback function to determine whether > * we walk over the current vma or not. Returning 0 means > * "do page table walk over the current vma", returning > diff --git a/mm/hmm.c b/mm/hmm.c > index dcd624f28bcf..b428f2011cfd 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -497,7 +497,13 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, > spin_unlock(ptl); > hugetlb_vma_unlock_read(vma); > - /* hmm_vma_fault() can retake the vma lock */ > + /* > + * Avoid deadlock: drop the vma lock before calling > + * hmm_vma_fault(), which will itself potentially take and drop > + * the vma lock. This is also correct from a protection point of > + * view, because there is no further use here of either pte or > + * ptl after dropping the vma lock. > + */ > ret = hmm_vma_fault(addr, end, required_fault, walk); > hugetlb_vma_lock_read(vma); > return ret; > > > > I guess it's on me to think of something cleaner, so if I do I'll pipe > > > up. :) > > > > That'll be very much appricated. > > > > It's really that I don't know how to make this better, or I can rework the > > series as long as it hasn't land upstream. > > > > It's always 10x easier to notice an imperfection, than it is to improve on > it. :) -- Peter Xu