On 4/24/22 07:50, Baolin Wang wrote: > The cache level flush will always be first when changing an existing > virtual–>physical mapping to a new value, since this allows us to > properly handle systems whose caches are strict and require a > virtual–>physical translation to exist for a virtual address. So we > should move the cache flushing before huge_pmd_unshare(). > > As Muchun pointed out[1], now the architectures whose supporting hugetlb > PMD sharing have no cache flush issues in practice. But I think we > should still follow the cache/TLB flushing rules when changing a valid > virtual address mapping in case of potential issues in future. > > [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@xxxxxxxxxxxxxxxxxxxxx/ > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > --- > mm/rmap.c | 40 ++++++++++++++++++++++------------------ > 1 file changed, 22 insertions(+), 18 deletions(-) > > diff --git a/mm/rmap.c b/mm/rmap.c > index 61e63db..81872bb 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > * do this outside rmap routines. > */ > VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); > + /* > + * huge_pmd_unshare unmapped an entire PMD page. Perhaps update this comment to say that huge_pmd_unshare 'may' unmap an entire PMD page? > + * There is no way of knowing exactly which PMDs may > + * be cached for this mm, so we must flush them all. > + * start/end were already adjusted above to cover this > + * range. > + */ > + flush_cache_range(vma, range.start, range.end); > + > if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { > - /* > - * huge_pmd_unshare unmapped an entire PMD > - * page. There is no way of knowing exactly > - * which PMDs may be cached for this mm, so > - * we must flush them all. start/end were > - * already adjusted above to cover this range. > - */ > - flush_cache_range(vma, range.start, range.end); > flush_tlb_range(vma, range.start, range.end); > mmu_notifier_invalidate_range(mm, range.start, > range.end); > @@ -1560,13 +1561,14 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > page_vma_mapped_walk_done(&pvmw); > break; > } > + } else { > + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); I know this call to flush_cache_page() existed before your change. But, when looking at this now I wonder how hugetlb pages are handled? Are there any versions of flush_cache_page() that take page size into account? -- Mike Kravetz > } > > /* > * Nuke the page table entry. When having to clear > * PageAnonExclusive(), we always have to flush. > */ > - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); > if (should_defer_flush(mm, flags) && !anon_exclusive) { > /* > * We clear the PTE but do not flush so potentially > @@ -1890,15 +1892,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, > * do this outside rmap routines. > */ > VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); > + /* > + * huge_pmd_unshare unmapped an entire PMD page. > + * There is no way of knowing exactly which PMDs may > + * be cached for this mm, so we must flush them all. > + * start/end were already adjusted above to cover this > + * range. > + */ > + flush_cache_range(vma, range.start, range.end); > + > if (huge_pmd_unshare(mm, vma, &address, pvmw.pte)) { > - /* > - * huge_pmd_unshare unmapped an entire PMD > - * page. There is no way of knowing exactly > - * which PMDs may be cached for this mm, so > - * we must flush them all. start/end were > - * already adjusted above to cover this range. > - */ > - flush_cache_range(vma, range.start, range.end); > flush_tlb_range(vma, range.start, range.end); > mmu_notifier_invalidate_range(mm, range.start, > range.end); > @@ -1915,10 +1918,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, > page_vma_mapped_walk_done(&pvmw); > break; > } > + } else { > + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); > } > > /* Nuke the page table entry. */ > - flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); > pteval = ptep_clear_flush(vma, address, pvmw.pte); > > /* Set the dirty flag on the folio now the pte is gone. */