On Thu, Nov 17, 2022 at 08:10:19PM -0500, Peter Xu wrote: > huge_pmd_share() is normally called with vma lock held already, it lets us > feel like we don't really need the walker lock. But that's not true, > because we only took the vma lock for "current" vma, but not all the vma > pgtables we're going to walk upon. > > Taking each vma lock may lead to deadlock and hard to order. The safe > approach is just to use the walker lock which guarantees the pgtable page > being alive, then we should use get_page_unless_zero() rather than > get_page() only, to make sure the pgtable page is not being freed at the > meantime. > > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > --- > mm/hugetlb.c | 16 +++++++++++++--- > 1 file changed, 13 insertions(+), 3 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 61a1fa678172..5ef883184885 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -7008,6 +7008,13 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, > spinlock_t *ptl; > > i_mmap_lock_read(mapping); > + > + /* > + * NOTE: even if we've got the vma read lock, here we still need to > + * take the walker lock, because we're not walking the current vma, > + * but some other mm's! > + */ > + hugetlb_walker_lock(); > vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { > if (svma == vma) > continue; > @@ -7016,12 +7023,15 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, > if (saddr) { > spte = huge_pte_offset(svma->vm_mm, saddr, > vma_mmu_pagesize(svma)); > - if (spte) { > - get_page(virt_to_page(spte)); > + /* > + * When page ref==0, it means it's probably being > + * freed; continue with the next one. > + */ > + if (spte && get_page_unless_zero(virt_to_page(spte))) > break; > - } > } > } > + hugetlb_walker_unlock(); > > if (!spte) > goto out; > -- > 2.37.3 > Will squash this in.. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6c77ae7a3d94..f9abede9de47 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7045,6 +7045,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, */ if (spte && get_page_unless_zero(virt_to_page(spte))) break; + else + spte = NULL; } } hugetlb_walker_unlock(); -- Peter Xu