[PATCH RFC v2 06/12] mm/hugetlb: Protect huge_pmd_share() with walker lock

Peter Xu <peterx@xxxxxxxxxx> · Thu, 17 Nov 2022 20:10:19 -0500

huge_pmd_share() is normally called with vma lock held already, it lets us
feel like we don't really need the walker lock.  But that's not true,
because we only took the vma lock for "current" vma, but not all the vma
pgtables we're going to walk upon.

Taking each vma lock may lead to deadlock and hard to order.  The safe
approach is just to use the walker lock which guarantees the pgtable page
being alive, then we should use get_page_unless_zero() rather than
get_page() only, to make sure the pgtable page is not being freed at the
meantime.

Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
---
 mm/hugetlb.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 61a1fa678172..5ef883184885 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7008,6 +7008,13 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
 	spinlock_t *ptl;
 
 	i_mmap_lock_read(mapping);
+
+	/*
+	 * NOTE: even if we've got the vma read lock, here we still need to
+	 * take the walker lock, because we're not walking the current vma,
+	 * but some other mm's!
+	 */
+	hugetlb_walker_lock();
 	vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
 		if (svma == vma)
 			continue;
@@ -7016,12 +7023,15 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
 		if (saddr) {
 			spte = huge_pte_offset(svma->vm_mm, saddr,
 					       vma_mmu_pagesize(svma));
-			if (spte) {
-				get_page(virt_to_page(spte));
+			/*
+			 * When page ref==0, it means it's probably being
+			 * freed; continue with the next one.
+			 */
+			if (spte && get_page_unless_zero(virt_to_page(spte)))
 				break;
-			}
 		}
 	}
+	hugetlb_walker_unlock();
 
 	if (!spte)
 		goto out;
-- 
2.37.3