On 2/24/2023 8:20 AM, Mike Kravetz wrote: > On 02/23/23 17:28, Matthew Wilcox wrote: >> On Thu, Feb 23, 2023 at 04:31:56PM +0800, Yin Fengwei wrote: >>> It's to prepare the batched rmap update for large folio. >>> No need to looped handle hugetlb. Just handle hugetlb and >>> bail out early. >>> >>> Almost no functional change. Just one change to mm counter >>> update. >> >> This looks like a very worthwhile cleanup in its own right. Adding >> various hugetlb & memory poisoning experts for commentary on the mm >> counter change (search for three asterisks) > > The counter change looks correct to me. > However, there is a bunch of code in the else cases for > > if (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) > > in try_to_unmap_one. Are you sure none of it applies to hugetlb pages? > It seems like some of the code in the > > } else if (folio_test_anon(folio)) { > > could apply. pte_uffd_wp perhaps? In the "else if (pte_unused).." and "else if (folio_test_anon(folio)" path, it uses set_pte_at or PAGE_SIZE for mmu_notifier_invalidate_range(). So I suppose hugetlb will not hit these two branches. > > Since you are cleaning up, I think the following make apply ... > >>> +static bool try_to_unmap_one_hugetlb(struct folio *folio, >>> + struct vm_area_struct *vma, struct mmu_notifier_range range, >>> + struct page_vma_mapped_walk pvmw, unsigned long address, >>> + enum ttu_flags flags) >>> +{ >>> + struct mm_struct *mm = vma->vm_mm; >>> + pte_t pteval; >>> + bool ret = true, anon = folio_test_anon(folio); >>> + >>> + /* >>> + * The try_to_unmap() is only passed a hugetlb page >>> + * in the case where the hugetlb page is poisoned. >>> + */ >>> + VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio); >>> + /* >>> + * huge_pmd_unshare may unmap an entire PMD page. >>> + * There is no way of knowing exactly which PMDs may >>> + * be cached for this mm, so we must flush them all. >>> + * start/end were already adjusted above to cover this >>> + * range. >>> + */ >>> + flush_cache_range(vma, range.start, range.end); >>> + >>> + /* >>> + * To call huge_pmd_unshare, i_mmap_rwsem must be >>> + * held in write mode. Caller needs to explicitly >>> + * do this outside rmap routines. >>> + * >>> + * We also must hold hugetlb vma_lock in write mode. >>> + * Lock order dictates acquiring vma_lock BEFORE >>> + * i_mmap_rwsem. We can only try lock here and fail >>> + * if unsuccessful. >>> + */ >>> + if (!anon) { >>> + VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); >>> + if (!hugetlb_vma_trylock_write(vma)) { >>> + ret = false; >>> + goto out; >>> + } >>> + if (huge_pmd_unshare(mm, vma, address, pvmw.pte)) { >>> + hugetlb_vma_unlock_write(vma); >>> + flush_tlb_range(vma, >>> + range.start, range.end); >>> + mmu_notifier_invalidate_range(mm, >>> + range.start, range.end); >>> + /* >>> + * The ref count of the PMD page was >>> + * dropped which is part of the way map >>> + * counting is done for shared PMDs. >>> + * Return 'true' here. When there is >>> + * no other sharing, huge_pmd_unshare >>> + * returns false and we will unmap the >>> + * actual page and drop map count >>> + * to zero. >>> + */ >>> + goto out; >>> + } >>> + hugetlb_vma_unlock_write(vma); >>> + } >>> + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); >>> + >>> + /* >>> + * Now the pte is cleared. If this pte was uffd-wp armed, >>> + * we may want to replace a none pte with a marker pte if >>> + * it's file-backed, so we don't lose the tracking info. >>> + */ >>> + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); >>> + >>> + /* Set the dirty flag on the folio now the pte is gone. */ >>> + if (pte_dirty(pteval)) >>> + folio_mark_dirty(folio); >>> + >>> + /* Update high watermark before we lower rss */ >>> + update_hiwater_rss(mm); >>> + >>> + if (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) { > > I'm guessing this can just be, > if (!(flags & TTU_IGNORE_HWPOISON)) > > as we only get here in the poison case as indicated by, > VM_BUG_ON_FOLIO(!folio_test_hwpoison(folio), folio); > above. You are correct. Will remove the folio_test_hwpoison(). > >>> + pteval = swp_entry_to_pte(make_hwpoison_entry(&folio->page)); >>> + set_huge_pte_at(mm, address, pvmw.pte, pteval); >>> + } >>> + >>> + /*** try_to_unmap_one() called dec_mm_counter for >>> + * (folio_test_hwpoison(folio) && !(flags & TTU_IGNORE_HWPOISON)) not >>> + * true case, looks incorrect. Change it to hugetlb_count_sub() here. >>> + */ >>> + hugetlb_count_sub(folio_nr_pages(folio), mm); >>> + >>> + /* >>> + * No need to call mmu_notifier_invalidate_range() it has be >>> + * done above for all cases requiring it to happen under page >>> + * table lock before mmu_notifier_invalidate_range_end() >>> + * >>> + * See Documentation/mm/mmu_notifier.rst >>> + */ >>> + page_remove_rmap(&folio->page, vma, folio_test_hugetlb(folio)); > > Always compound/hugetlb, so could be: > > page_remove_rmap(&folio->page, vma, true); This is in the patch 3 of this series. > >>> + if (vma->vm_flags & VM_LOCKED) >>> + mlock_drain_local(); > > Since hugetlb pages are always memory resident, I do not think the above > is necessary. OK. Will remove this piece of code. Thanks a lot for the comments. Regards Yin, Fengwei > >>> + folio_put(folio); >>> + >>> +out: >>> + return ret; >>> +} >