Re: [PATCH v3 08/14] mm: copy_pte_range() use pte_offset_map_rw_nolock()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024/9/4 16:40, Qi Zheng wrote:
In copy_pte_range(), we may modify the src_pte entry after holding the
src_ptl, so convert it to using pte_offset_map_rw_nolock(). Since we may
free the PTE page in retract_page_tables() without holding the read lock
of mmap_lock, so we still need to get pmdval and do pmd_same() check after
the ptl is held.

See commit 3db82b9374ca92, copy_pte_range and retract_page_tables
are using vma->anon_vma to be exclusive.

retract_page_tables()                    copy_page_range()
    vma_interval_tree_foreach()              if (!vma_needs_copy())
        if (READ_ONCE(vma->anon_vma))            return 0;
            continue;                        copy_pte_range()

So I think mmap write lock here is also used for keeping ->anon_vma stable.
And we do not need pmd_same().

Muchun,
Thanks.

Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
---
Hi Muchun, since the code has changed, I dropped your Reviewed-by tag here.

  mm/memory.c | 18 +++++++++++++++++-
  1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 06674f94b7a4e..47974cc4bd7f2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1082,6 +1082,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
  	struct mm_struct *src_mm = src_vma->vm_mm;
  	pte_t *orig_src_pte, *orig_dst_pte;
  	pte_t *src_pte, *dst_pte;
+	pmd_t pmdval;
  	pte_t ptent;
  	spinlock_t *src_ptl, *dst_ptl;
  	int progress, max_nr, ret = 0;
@@ -1107,13 +1108,28 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
  		ret = -ENOMEM;
  		goto out;
  	}
-	src_pte = pte_offset_map_nolock(src_mm, src_pmd, addr, &src_ptl);
+
+	/*
+	 * Since we may free the PTE page in retract_page_tables() without
+	 * holding the read lock of mmap_lock, so we still need to do a
+	 * pmd_same() check after holding the PTL.
+	 */
+	src_pte = pte_offset_map_rw_nolock(src_mm, src_pmd, addr, &pmdval,
+					   &src_ptl);
  	if (!src_pte) {
  		pte_unmap_unlock(dst_pte, dst_ptl);
  		/* ret == 0 */
  		goto out;
  	}
  	spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
+
+	if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(src_pmd)))) {
+		pte_unmap_unlock(src_pte, src_ptl);
+		pte_unmap_unlock(dst_pte, dst_ptl);
+		/* ret == 0 */
+		goto out;
+	}
+
  	orig_src_pte = src_pte;
  	orig_dst_pte = dst_pte;
  	arch_enter_lazy_mmu_mode();





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux