On 02/23/23 17:28, Matthew Wilcox wrote: > On Thu, Feb 23, 2023 at 04:31:56PM +0800, Yin Fengwei wrote: > > It's to prepare the batched rmap update for large folio. > > No need to looped handle hugetlb. Just handle hugetlb and > > bail out early. > > > > Almost no functional change. Just one change to mm counter > > update. > > This looks like a very worthwhile cleanup in its own right. Adding > various hugetlb & memory poisoning experts for commentary on the mm > counter change (search for three asterisks) The counter change looks correct to me. However, there is a bunch of code in the else cases for if (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) in try_to_unmap_one. Are you sure none of it applies to hugetlb pages? It seems like some of the code in the } else if (folio_test_anon(folio)) { could apply. pte_uffd_wp perhaps? Since you are cleaning up, I think the following make apply ... > > +static bool try_to_unmap_one_hugetlb(struct folio *folio, > > + struct vm_area_struct *vma, struct mmu_notifier_range range, > > + struct page_vma_mapped_walk pvmw, unsigned long address, > > + enum ttu_flags flags) > > +{ > > + struct mm_struct *mm = vma->vm_mm; > > + pte_t pteval; > > + bool ret = true, anon = folio_test_anon(folio); > > + > > + /* > > + * The try_to_unmap() is only passed a hugetlb page > > + * in the case where the hugetlb page is poisoned. > > + */ > > + VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio); > > + /* > > + * huge_pmd_unshare may unmap an entire PMD page. > > + * There is no way of knowing exactly which PMDs may > > + * be cached for this mm, so we must flush them all. > > + * start/end were already adjusted above to cover this > > + * range. > > + */ > > + flush_cache_range(vma, range.start, range.end); > > + > > + /* > > + * To call huge_pmd_unshare, i_mmap_rwsem must be > > + * held in write mode. Caller needs to explicitly > > + * do this outside rmap routines. > > + * > > + * We also must hold hugetlb vma_lock in write mode. > > + * Lock order dictates acquiring vma_lock BEFORE > > + * i_mmap_rwsem. We can only try lock here and fail > > + * if unsuccessful. > > + */ > > + if (!anon) { > > + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); > > + if (!hugetlb_vma_trylock_write(vma)) { > > + ret = false; > > + goto out; > > + } > > + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { > > + hugetlb_vma_unlock_write(vma); > > + flush_tlb_range(vma, > > + range.start, range.end); > > + mmu_notifier_invalidate_range(mm, > > + range.start, range.end); > > + /* > > + * The ref count of the PMD page was > > + * dropped which is part of the way map > > + * counting is done for shared PMDs. > > + * Return 'true' here. When there is > > + * no other sharing, huge_pmd_unshare > > + * returns false and we will unmap the > > + * actual page and drop map count > > + * to zero. > > + */ > > + goto out; > > + } > > + hugetlb_vma_unlock_write(vma); > > + } > > + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); > > + > > + /* > > + * Now the pte is cleared. If this pte was uffd-wp armed, > > + * we may want to replace a none pte with a marker pte if > > + * it's file-backed, so we don't lose the tracking info. > > + */ > > + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); > > + > > + /* Set the dirty flag on the folio now the pte is gone. */ > > + if (pte_dirty(pteval)) > > + folio_mark_dirty(folio); > > + > > + /* Update high watermark before we lower rss */ > > + update_hiwater_rss(mm); > > + > > + if (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) { I'm guessing this can just be, if (!(flags & TTU_IGNORE_HWPOISON)) as we only get here in the poison case as indicated by, VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio); above. > > + pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page)); > > + set_huge_pte_at(mm, address, pvmw.pte, pteval); > > + } > > + > > + /*** try_to_unmap_one() called dec_mm_counter for > > + * (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) not > > + * true case, looks incorrect. Change it to hugetlb_count_sub() here. > > + */ > > + hugetlb_count_sub(folio_nr_pages(folio), mm); > > + > > + /* > > + * No need to call mmu_notifier_invalidate_range() it has be > > + * done above for all cases requiring it to happen under page > > + * table lock before mmu_notifier_invalidate_range_end() > > + * > > + * See Documentation/mm/mmu_notifier.rst > > + */ > > + page_remove_rmap(&folio->page, vma, folio_test_hugetlb(folio)); Always compound/hugetlb, so could be: page_remove_rmap(&folio->page, vma, true); > > + if (vma->vm_flags & VM_LOCKED) > > + mlock_drain_local(); Since hugetlb pages are always memory resident, I do not think the above is necessary. > > + folio_put(folio); > > + > > +out: > > + return ret; > > +} -- Mike Kravetz