On Thu, Oct 31, 2024 at 9:14 AM Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote: > In retract_page_tables(), we may modify the pmd entry after acquiring the > pml and ptl, so we should also check whether the pmd entry is stable. Why does taking the PMD lock not guarantee that the PMD entry is stable? > Using pte_offset_map_rw_nolock() + pmd_same() to do it, and then we can > also remove the calling of the pte_lockptr(). > > Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> > --- > mm/khugepaged.c | 17 ++++++++++++++++- > 1 file changed, 16 insertions(+), 1 deletion(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 6f8d46d107b4b..6d76dde64f5fb 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -1721,6 +1721,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > spinlock_t *pml; > spinlock_t *ptl; > bool skipped_uffd = false; > + pte_t *pte; > > /* > * Check vma->anon_vma to exclude MAP_PRIVATE mappings that > @@ -1756,11 +1757,25 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) > addr, addr + HPAGE_PMD_SIZE); > mmu_notifier_invalidate_range_start(&range); > > + pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pgt_pmd, &ptl); > + if (!pte) { > + mmu_notifier_invalidate_range_end(&range); > + continue; > + } > + > pml = pmd_lock(mm, pmd); I don't understand why you're mapping the page table before locking the PMD. Doesn't that just mean we need more error checking afterwards? > - ptl = pte_lockptr(mm, pmd); > if (ptl != pml) > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > > + if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) { > + pte_unmap_unlock(pte, ptl); > + if (ptl != pml) > + spin_unlock(pml); > + mmu_notifier_invalidate_range_end(&range); > + continue; > + } > + pte_unmap(pte); > + > /* > * Huge page lock is still held, so normally the page table > * must remain empty; and we have already skipped anon_vma > -- > 2.20.1 >