On Wed, 2023-03-08 at 13:38 -0800, Mike Kravetz wrote: > On 03/06/23 17:22, Yin Fengwei wrote: > > It's to prepare the batched rmap update for large folio. > > No need to looped handle hugetlb. Just handle hugetlb and > > bail out early. > > > > Signed-off-by: Yin Fengwei <fengwei.yin@xxxxxxxxx> > > --- > > mm/rmap.c | 200 +++++++++++++++++++++++++++++++++----------------- > > ---- > > 1 file changed, 121 insertions(+), 79 deletions(-) > > Looks good, > > Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> Thanks a lot for reviewing. > > A few nits below. > > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index ba901c416785..508d141dacc5 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1441,6 +1441,103 @@ void page_remove_rmap(struct page *page, > > struct vm_area_struct *vma, > > munlock_vma_folio(folio, vma, compound); > > } > > > > +static bool try_to_unmap_one_hugetlb(struct folio *folio, > > + struct vm_area_struct *vma, struct > > mmu_notifier_range range, > > + struct page_vma_mapped_walk pvmw, unsigned long > > address, > > + enum ttu_flags flags) > > +{ > > + struct mm_struct *mm = vma->vm_mm; > > + pte_t pteval; > > + bool ret = true, anon = folio_test_anon(folio); > > + > > + /* > > + * The try_to_unmap() is only passed a hugetlb page > > + * in the case where the hugetlb page is poisoned. > > + */ > > + VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio); > > + /* > > + * huge_pmd_unshare may unmap an entire PMD page. > > + * There is no way of knowing exactly which PMDs may > > + * be cached for this mm, so we must flush them all. > > + * start/end were already adjusted above to cover this > > nit, start/end are adjusted in caller (try_to_unmap_one) not above. Yes. Will update the comment. > > > + * range. > > + */ > > + flush_cache_range(vma, range.start, range.end); > > + > > + /* > > + * To call huge_pmd_unshare, i_mmap_rwsem must be > > + * held in write mode. Caller needs to explicitly > > + * do this outside rmap routines. > > + * > > + * We also must hold hugetlb vma_lock in write mode. > > + * Lock order dictates acquiring vma_lock BEFORE > > + * i_mmap_rwsem. We can only try lock here and fail > > + * if unsuccessful. > > + */ > > + if (!anon) { > > + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); > > + if (!hugetlb_vma_trylock_write(vma)) { > > + ret = false; > > + goto out; > > + } > > + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { > > + hugetlb_vma_unlock_write(vma); > > + flush_tlb_range(vma, > > + range.start, range.end); > > + mmu_notifier_invalidate_range(mm, > > + range.start, range.end); > > + /* > > + * The ref count of the PMD page was > > + * dropped which is part of the way map > > + * counting is done for shared PMDs. > > + * Return 'true' here. When there is > > + * no other sharing, huge_pmd_unshare > > + * returns false and we will unmap the > > + * actual page and drop map count > > + * to zero. > > + */ > > + goto out; > > + } > > + hugetlb_vma_unlock_write(vma); > > + } > > + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); > > + > > + /* > > + * Now the pte is cleared. If this pte was uffd-wp armed, > > + * we may want to replace a none pte with a marker pte if > > + * it's file-backed, so we don't lose the tracking info. > > + */ > > + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, > > pteval); > > + > > + /* Set the dirty flag on the folio now the pte is gone. */ > > + if (pte_dirty(pteval)) > > nit, technically, I suppose this should be huge_pte_dirty but it > really is > the same as pte_dirty which is why it works in current code. Yes. Will update to use huge_pte_dirty(). > > > + folio_mark_dirty(folio); > > + > > + /* Update high watermark before we lower rss */ > > + update_hiwater_rss(mm); > > + > > + /* Poisoned hugetlb folio with TTU_HWPOISON always cleared > > in flags */ > > + pteval = swp_entry_to_pte(make_hwpoison_entry(&folio- > > >page)); > > + set_huge_pte_at(mm, address, pvmw.pte, pteval); > > + hugetlb_count_sub(folio_nr_pages(folio), mm); > > + > > + /* > > + * No need to call mmu_notifier_invalidate_range() it has > > be > > + * done above for all cases requiring it to happen under > > page > > + * table lock before mmu_notifier_invalidate_range_end() > > + * > > + * See Documentation/mm/mmu_notifier.rst > > + */ > > + page_remove_rmap(&folio->page, vma, > > folio_test_hugetlb(folio)); > > nit, we KNOW folio_test_hugetlb(folio) is true here so can just pass > 'true'. In addition, the same call in try_to_unmap_one is now known > to > always be false. No need to check folio_test_hugetlb(folio) there as > well. The "folio_test_hugetlb(folio) -> true" changes was in patch 3. I tried to apply "one patch for code moving and one patch for code change)" to make review easy. But this patch already changed the code, I will move "folio_test_hugetlb(folio)->true" from patch 3 to this one. Thanks. Regards Yin, Fengwei >