On 4/27/22 22:55, Muchun Song wrote: > On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote: >> The cache level flush will always be first when changing an existing >> virtual–>physical mapping to a new value, since this allows us to >> properly handle systems whose caches are strict and require a >> virtual–>physical translation to exist for a virtual address. So we >> should move the cache flushing before huge_pmd_unshare(). >> > > Right. > >> As Muchun pointed out[1], now the architectures whose supporting hugetlb >> PMD sharing have no cache flush issues in practice. But I think we >> should still follow the cache/TLB flushing rules when changing a valid >> virtual address mapping in case of potential issues in future. > > Right. One point i need to clarify. I do not object this change but > want you to clarify this (not an issue in practice) in commit log > to let others know they do not need to bp this. > >> >> [1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@xxxxxxxxxxxxxxxxxxxxx/ >> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> >> --- >> mm/rmap.c | 40 ++++++++++++++++++++++------------------ >> 1 file changed, 22 insertions(+), 18 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 61e63db..4f0d115 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, >> * do this outside rmap routines. >> */ >> VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); >> + /* >> + * huge_pmd_unshare may unmap an entire PMD page. >> + * There is no way of knowing exactly which PMDs may >> + * be cached for this mm, so we must flush them all. >> + * start/end were already adjusted above to cover this >> + * range. >> + */ >> + flush_cache_range(vma, range.start, range.end); >> + > > flush_cache_range() is always called even if we do not need to flush. > How about introducing a new helper like hugetlb_pmd_shared() which > returns true for shared PMD? Then: > > if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) { > flush_cache_range(vma, range.start, range.end); > huge_pmd_unshare(mm, vma, &address, pvmw.pte); > flush_tlb_range(vma, range.start, range.end); > } > > The code could be a little simpler. Right? > > Thanks. > I thought about adding a 'hugetlb_pmd_shared()' interface for another use. I believe it could even be used earlier in this call sequence. Since we hold i_mmap_rwsem, we would even test for shared BEFORE calling adjust_range_if_pmd_sharing_possible. We can not make an authoritative test in adjust range... because not all callers will be holding i_mmap_rwsem. I think we COULD optimize to minimize the flush range. However, I think that would complicate this code even more, and it is difficult enough to follow. My preference would be to over flush as is done here for correctness and simplification. We can optimize later if desired. With Muchun's comment that this is not an issue in practice today, Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> -- Mike Kravetz