On 5/4/2022 2:42 AM, Mike Kravetz wrote:
On 4/27/22 22:55, Muchun Song wrote:
On Wed, Apr 27, 2022 at 06:52:06PM +0800, Baolin Wang wrote:
The cache level flush will always be first when changing an existing
virtual–>physical mapping to a new value, since this allows us to
properly handle systems whose caches are strict and require a
virtual–>physical translation to exist for a virtual address. So we
should move the cache flushing before huge_pmd_unshare().
Right.
As Muchun pointed out[1], now the architectures whose supporting hugetlb
PMD sharing have no cache flush issues in practice. But I think we
should still follow the cache/TLB flushing rules when changing a valid
virtual address mapping in case of potential issues in future.
Right. One point i need to clarify. I do not object this change but
want you to clarify this (not an issue in practice) in commit log
to let others know they do not need to bp this.
[1] https://lore.kernel.org/all/YmT%2F%2FhuUbFX+KHcy@xxxxxxxxxxxxxxxxxxxxx/
Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
---
mm/rmap.c | 40 ++++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 18 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 61e63db..4f0d115 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1535,15 +1535,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
* do this outside rmap routines.
*/
VM_BUG_ON(!(flags & TTU_RMAP_LOCKED));
+ /*
+ * huge_pmd_unshare may unmap an entire PMD page.
+ * There is no way of knowing exactly which PMDs may
+ * be cached for this mm, so we must flush them all.
+ * start/end were already adjusted above to cover this
+ * range.
+ */
+ flush_cache_range(vma, range.start, range.end);
+
flush_cache_range() is always called even if we do not need to flush.
How about introducing a new helper like hugetlb_pmd_shared() which
returns true for shared PMD? Then:
if (hugetlb_pmd_shared(mm, vma, pvmw.pte)) {
flush_cache_range(vma, range.start, range.end);
huge_pmd_unshare(mm, vma, &address, pvmw.pte);
flush_tlb_range(vma, range.start, range.end);
}
The code could be a little simpler. Right?
Thanks.
I thought about adding a 'hugetlb_pmd_shared()' interface for another use.
I believe it could even be used earlier in this call sequence. Since we
hold i_mmap_rwsem, we would even test for shared BEFORE calling
adjust_range_if_pmd_sharing_possible. We can not make an authoritative test
in adjust range... because not all callers will be holding i_mmap_rwsem.
I think we COULD optimize to minimize the flush range. However, I think
that would complicate this code even more, and it is difficult enough to
follow.
My preference would be to over flush as is done here for correctness and
simplification. We can optimize later if desired.
OK. Agree.
With Muchun's comment that this is not an issue in practice today,
Reviewed-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
Thanks.