Excerpts from Linus Torvalds's message of July 29, 2020 5:02 am: > On Tue, Jul 28, 2020 at 3:53 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote: >> >> The quirk is a problem with coprocessor where it's supposed to >> invalidate the translation after a fault but it doesn't, so we can get a >> read-only TLB stuck after something else does a RO->RW upgrade on the >> TLB. Something like that IIRC. Coprocessors have their own MMU which >> lives in the nest not the core, so you need a global TLB flush to >> invalidate that thing. > > So I assumed, but it does seem confused. > > Why? Because if there are stale translations on the co-processor, > there's no guarantee that one of the CPU's will have them and take a > fault. > > So I'm not seeing why a core CPU doing spurious TLB invalidation would > follow from "stale TLB in the Nest". If the nest MMU access faults, it sends an interrupt to the CPU and the driver tries to handle the page fault for it (I think that's how it works). > If anything, I think "we have a coprocessor that needs to never have > stale TLB entries" would impact the _regular_ TLB invalidates (by > update_mmu_cache()) and perhaps make those more aggressive, exactly > because the coprocessor may not handle the fault as gracefully. It could be done that way... Hmm although we do have something similar also in radix__ptep_set_access_flags for the relaxing permissions case so maybe this is required for not-present faults as well? I'm not actually sure now. But it's a bit weird and awkward because it's working around quirks in the hardware which aren't regular, not because we're _completely_ confused (I hope). Thanks, Nick