========== Background ========== The arm64 architecture specifies TLB permission bits may be cached and therefore the TLB must be invalidated during permission upgrades. For the CPU this currently occurs in the architecture specific ptep_set_access_flags() routine. Secondary TLBs such as implemented by the SMMU IOMMU match the CPU architecture specification and may also cache permission bits and require the same TLB invalidations. This may be achieved in one of two ways. Some SMMU implementations implement broadcast TLB maintenance (BTM). This snoops CPU TLB invalidates and will invalidate any secondary TLB at the same time as the CPU. However implementations are not required to implement BTM. Implementations without BTM rely on mmu notifier callbacks to send explicit TLB invalidation commands to invalidate SMMU TLB. Therefore either generic kernel code or architecture specific code needs to call the mmu notifier on permission upgrade. Currently that doesn't happen so devices will fault indefinitely when writing to a PTE that was previously read-only as nothing invalidates the SMMU TLB. To fix that this series first renames the .invalidate_range() callback to .invalidate_secondary_tlbs() as suggested by Jason and Sean to make it clear this callback is only used for secondary TLBs. That was made possible thanks to Sean's series [1] to remove KVM incorrect usage. This series is currently in linux-next. >From here there are several possible solutions for which I would like some feedback on a preferred approach. ========= Solutions ========= 1. Add a call to mmu_notifier_invalidate_secondary_tlbs() to the arm64 version of ptep_set_access_flags(). This is what this RFC series does as it is the simplest solution. Arguably this call should be made by generic kernel code though to catch other platforms that need it. However only ARM64, IA64 and Sparc flush the TLB in ptep_set_access_flags() and AFAIK only ARM64 has an IOMMU that uses shared page-tables and is therefore the only platform affected by this. 2. Add a call to mmu_notifier_invalidate_secondary_tlbs() to generic kernel code. The problem with this approach is generic kernel code has no way of knowing if it can be skipped or not for a given IOMMU. That leads to over invalidation and subsequent performance loss on the majority of platforms that don't need it. 3. Implement a new set of notifier operations (eg. tlb_notifier_ops) specifically for secondary TLBs with a range of operations that can be called by generic kernel code for every PTE modification. See [2] for a prototype implementation of this idea. This solves the problems of (1) and (2) because an IOMMU would only implement the operations it needs. It also keeps the layering nice as theoretically there is no reason a secondary TLB has to follow the main CPU architecture specification so is free to implement its own operations (although I know of no hardware that does this). However it adds complexity for dealing with a problem that only exists on some implementations of a particular feature on one architecture. For that reason I think (1) is the best path forward due to simplicity but would appreciate any feedback here. ============ Other Issues ============ It is unclear if mmu_notifier_invalidate_secondary_tlbs() should be called from mmu_notifier_range_end(). Currently it is, as an analysis of existing code shows most code doesn't explicitly invalidate secondary TLBs and relies on it being called as part of the end() call. The disadvantage of changing code to explicitly invalidate secondary TLBs is generally it can't take advantage of IOMMU specific range based TLB invalidation commands because explicit invalidations happen one page at a time under PTL. To solve that we could add secondary TLB invalidation calls to the TLB batching code, but that adds complexity so I'm not sure it's worth it but would appreciate feedback. [1] - https://lore.kernel.org/all/20230602011518.787006-1-seanjc@xxxxxxxxxx/ [2] - https://lore.kernel.org/all/87h6rhw4i0.fsf@xxxxxxxxxx/ Alistair Popple (2): mm_notifiers: Rename invalidate_range notifier arm64: Notify on pte permission upgrades arch/arm64/mm/fault.c | 7 +- arch/arm64/mm/hugetlbpage.c | 9 +++- drivers/iommu/amd/iommu_v2.c | 10 +-- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 13 ++-- drivers/iommu/intel/svm.c | 8 +-- drivers/misc/ocxl/link.c | 8 +-- include/asm-generic/tlb.h | 2 +- include/linux/mmu_notifier.h | 55 +++++++++--------- mm/huge_memory.c | 4 +- mm/hugetlb.c | 10 +-- mm/mmu_notifier.c | 52 ++++++++++------- mm/rmap.c | 42 +++++++------- 12 files changed, 125 insertions(+), 95 deletions(-) base-commit: b16049b21162bb649cdd8519642a35972b7910fe -- git-series 0.9.1