On 4/24/22 07:50, Baolin Wang wrote: > When moving hugetlb page tables, the cache flushing is called in > move_page_tables() without considering the shared PMDs, which may > be cause cache issues on some architectures. > > Thus we should move the hugetlb cache flushing into > move_hugetlb_page_tables() with considering the shared PMDs ranges, > calculated by adjust_range_if_pmd_sharing_possible(). Meanwhile also > expanding the TLBs flushing range in case of shared PMDs. > > Note this is discovered via code inspection, and did not meet a real > problem in practice so far. > > Fixes: 550a7d60bd5e ("mm, hugepages: add mremap() support for hugepage backed vma") > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> > --- > mm/hugetlb.c | 17 +++++++++++++++-- > mm/mremap.c | 2 +- > 2 files changed, 16 insertions(+), 3 deletions(-) Thanks! Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> -- Mike Kravetz > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 1945dfb..d3a6094 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4937,10 +4937,17 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, > unsigned long old_addr_copy; > pte_t *src_pte, *dst_pte; > struct mmu_notifier_range range; > + bool shared_pmd = false; > > mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr, > old_end); > adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); > + /* > + * In case of shared PMDs, we should cover the maximum possible > + * range. > + */ > + flush_cache_range(vma, range.start, range.end); > + > mmu_notifier_invalidate_range_start(&range); > /* Prevent race with file truncation */ > i_mmap_lock_write(mapping); > @@ -4957,8 +4964,10 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, > */ > old_addr_copy = old_addr; > > - if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte)) > + if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte)) { > + shared_pmd = true; > continue; > + } > > dst_pte = huge_pte_alloc(mm, new_vma, new_addr, sz); > if (!dst_pte) > @@ -4966,7 +4975,11 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, > > move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte); > } > - flush_tlb_range(vma, old_end - len, old_end); > + > + if (shared_pmd) > + flush_tlb_range(vma, range.start, range.end); > + else > + flush_tlb_range(vma, old_end - len, old_end); > mmu_notifier_invalidate_range_end(&range); > i_mmap_unlock_write(mapping); > > diff --git a/mm/mremap.c b/mm/mremap.c > index 98f50e6..0970025 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -490,12 +490,12 @@ unsigned long move_page_tables(struct vm_area_struct *vma, > return 0; > > old_end = old_addr + len; > - flush_cache_range(vma, old_addr, old_end); > > if (is_vm_hugetlb_page(vma)) > return move_hugetlb_page_tables(vma, new_vma, old_addr, > new_addr, len); > > + flush_cache_range(vma, old_addr, old_end); > mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, > old_addr, old_end); > mmu_notifier_invalidate_range_start(&range);