This is a follow up to the initial RFC posted here - https://lore.kernel.org/linux-mm/cover.063f3dc2100ae7cbe3a6527689589646ea787216.1687259597.git-series.apopple@xxxxxxxxxx/ The main change is to move secondary TLB invalidation mmu notifier callbacks into the architecture specific TLB flushing functions. This makes secondary TLB invalidation mostly match CPU invalidation while still allowing efficient range based invalidations based on the existing TLB bathing code. There are some known issues with this series. What I am looking for here is comments with regards to the overall approach. These issues will be fixed if we continue with this approach. ========== Background ========== The arm64 architecture specifies TLB permission bits may be cached and therefore the TLB must be invalidated during permission upgrades. For the CPU this currently occurs in the architecture specific ptep_set_access_flags() routine. Secondary TLBs such as implemented by the SMMU IOMMU match the CPU architecture specification and may also cache permission bits and require the same TLB invalidations. This may be achieved in one of two ways. Some SMMU implementations implement broadcast TLB maintenance (BTM). This snoops CPU TLB invalidates and will invalidate any secondary TLB at the same time as the CPU. However implementations are not required to implement BTM. Implementations without BTM rely on mmu notifier callbacks to send explicit TLB invalidation commands to invalidate SMMU TLB. Therefore either generic kernel code or architecture specific code needs to call the mmu notifier on permission upgrade. Currently that doesn't happen so devices will fault indefinitely when writing to a PTE that was previously read-only as nothing invalidates the SMMU TLB. ======== Solution ======== To fix this the series first renames the .invalidate_range() callback to .arch_invalidate_secondary_tlbs() as suggested by Jason and Sean to make it clear this callback is only used for secondary TLBs. That was made possible thanks to Sean's series [1] to remove KVM's incorrect usage. Based on feedback from Jason [2] the proposed solution to the bug is to move the calls to mmu_notifier_arch_invalidate_secondary_tlbs() closer to the architecture specific TLB invalidation code. This ensures the secondary TLB won't miss invalidations, including the existing invalidation in the ARM64 code to deal with permission upgrade. Currently only ARM64, PowerPC and x86 have IOMMU with secondary TLBs requiring SW invalidation so the notifier is only called for those architectures. It's also not called for invalidation of kernel mappings as that doesn't currently happen anyway so it is assumed to not be required. ============ Known Issues ============ Not all TLB invalidation call sites have been updated to call a notifier when required. This results in test failures due to incorrect TLB entries. Obviously that will be fixed if this general approach to fixing the bug is adopted. The kernel TLB flushing functions may also need updating (see comments in patch 2). [1] - https://lore.kernel.org/all/20230602011518.787006-1-seanjc@xxxxxxxxxx/ [2] - https://lore.kernel.org/linux-mm/ZJMR5bw8l+BbzdJ7@xxxxxxxx/ Alistair Popple (3): mm_notifiers: Rename invalidate_range notifier mmu_notifiers: Call arch_invalidate_secondary_tlbs() when invalidating TLBs mmu_notifiers: Don't invalidate secondary TLBs as part of mmu_notifier_invalidate_range_end() arch/arm64/include/asm/tlbflush.h | 5 +- arch/powerpc/include/asm/book3s/64/tlbflush.h | 1 +- arch/powerpc/mm/book3s64/radix_hugetlbpage.c | 1 +- arch/powerpc/mm/book3s64/radix_tlb.c | 6 +- arch/x86/mm/tlb.c | 3 +- drivers/iommu/amd/iommu_v2.c | 10 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 13 +- drivers/iommu/intel/svm.c | 8 +- drivers/misc/ocxl/link.c | 8 +- include/asm-generic/tlb.h | 1 +- include/linux/mmu_notifier.h | 104 ++++------------- kernel/events/uprobes.c | 2 +- mm/huge_memory.c | 29 +---- mm/hugetlb.c | 8 +- mm/memory.c | 8 +- mm/migrate_device.c | 9 +- mm/mmu_notifier.c | 47 +++----- mm/rmap.c | 40 +------- 18 files changed, 96 insertions(+), 207 deletions(-) base-commit: a452483508d7b70b0f6c69e249ec0b3ea2330b5c -- git-series 0.9.1