Jason Gunthorpe <jgg@xxxxxxxxxx> writes: > On Wed, May 24, 2023 at 11:47:29AM +1000, Alistair Popple wrote: >> ARM64 requires TLB invalidates when upgrading pte permission from >> read-only to read-write. However mmu_notifiers assume upgrades do not >> need notifications and none are sent. This causes problems when a >> secondary TLB such as implemented by an ARM SMMU doesn't support >> broadcast TLB maintenance (BTM) and caches a read-only PTE. > > I don't really like this design, but I see how you get here.. Not going to argue with that, I don't love it either but it seemed like the most straight forward approach. > mmu notifiers behavior should not be tied to the architecture, they > are supposed to be generic reflections of what the MM is doing so that > they can be hooked into by general purpose drivers. Interesting. I've always assumed mmu notifiers were primarly about keeping cache invalidations in sync with what the MM is doing. This is based on the fact that you can't use mmu notifiers to establish mappings and we instead have this rather complex dance with hmm_range_fault() to establish a mapping. My initial version [1] effectivly did add a generic event. Admittedly it was somewhat incomplete, because I didn't hook up the new mmu notifier event type to every user that could possibliy ignore it (eg. KVM). But that was part of the problem - it was hard to figure out which mmu notifier users can safely ignore it versus ones that can't, and that may depend on what architecture it's running on. Hence why I hooked it up to ptep_set_access_flags, because you get arch specific filtering as required. Perhaps the better option is to introduce a new mmu notifier op and let drivers opt-in? > If you want to hardwire invalidate_range to be only for SVA cases that > actually share the page table itself and rely on some arch-defined > invalidation, then we should give the op a much better name and > discourage anyone else from abusing the new ops variable behavior. Well that's the only use case I currently care about because we have hit this issue, so for now at least I'd much rather a straight forward fix we can backport. The problem is an invalidation isn't well defined. If we are to make this architecture independent then we need to be sending an invalidation for any PTE state change (ie. clean/dirty/writeable/read-only/present/not-present/etc) which we don't do currently. >> As no notification is sent and the SMMU does not snoop TLB invalidates >> it will continue to return read-only entries to a device even though >> the CPU page table contains a writable entry. This leads to a >> continually faulting device and no way of handling the fault. > > Doesn't the fault generate a PRI/etc? If we get a PRI maybe we should > just have the iommu driver push an iotlb invalidation command before > it acks it? PRI is already really slow so I'm not sure a pipelined > invalidation is going to be a problem? Does the SMMU architecture > permit negative caching which would suggest we need it anyhow? Yes, SMMU architecture (which matches the ARM architecture in regards to TLB maintenance requirements) permits negative caching of some mapping attributes including the read-only attribute. Hence without the flushing we fault continuously. > Jason [1] - https://lore.kernel.org/linux-mm/ZGxg+I8FWz3YqBMk@xxxxxxxxxxxxx/T/