On Thu, May 11, 2023 at 07:42:58PM +0100, Marc Zyngier wrote: > On Thu, 11 May 2023 18:15:15 +0100, > Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote: > > > When stage1 translation is disabled, the SCTRL_E1.I bit controls the > > > attributes used for instruction fetch, one of the options results in a > > > non-cacheable access. A whole host of CPUs missed the FWB override > > > in this case, meaning a KVM guest could fetch stale/junk data instead of > > > instructions. > > > > > > The workaround is to disable FWB, and do the required cache maintenance > > > instead. > > > > I think the workaround can be to only do the required cache maintenance > > without disabling FWB. Having FWB on doesn't bring any performance > > benefits if we do the cache maintenance anyway but keeping it around may > > be useful for other reasons (e.g. KVM device pass-through using > > cacheable mappings, though not something KVM supports currently). > > But you'd also rely on the guest doing its own cache maintenance for > instructions it writes, right? Ah, you are right. It looks like I only considered the host writing instructions. If the guest disabled stage 1 and wrote some instructions with FWB on, they'd not necessarily reach the PoC while the instructions are fetched from PoC with this bug. Even with SCTLR_EL1.I==0, the guest is supposed to do an IC IVAU if it wrote instructions but that's not sufficient (hint to the micro-architects, add a chicken bit to upgrade IC IVAU to also do a DC CVAC ;)) > Which probably means exposing a different CLIDR_EL1 so that > LoC/LoUU/LoUIS are consistent with *not* having FWB... I also wonder > if keeping FWB set has the potential to change the semantics of the > CMOs (the spec seems silent on that front). Not sure about CMOs, I'd expect them to behave in the same way. However, I don't see how faking CLIDR_EL1 can trick the guest into doing DC CVAC when its MMU is off. > > That said, maybe we can reduce the risk further by doing the > > vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a > > stage 2 exec fault (together with the I-cache invalidation). We can then > > ignore any other cache maintenance on S2 faults until someone shouts (we > > can maybe recommend forcing FWB off on the command line through the > > cpuid override). > > You lost me here with your vcpu_has_run_once(). Most likely I lost myself in the code. So the tricks we used in the past tracking the guest MMU off/on was only for the D side. If (we hope that) the guest only wrote instructions to a page once before executing them (and never writing instructions again), we could trap a subsequent exec fault and do the D-cache clean to PoC again. > Keeping the CMOs on exec fault is definitely manageable. But is that > enough? Yeah, not sure it's enough if the guest keeps writing instructions to the same page with the MMU off. -- Catalin