Re: [PATCH 0/6] arm64: errata: Disable FWB on parts with non-ARM interconnects

Marc Zyngier <maz@xxxxxxxxxx> · Thu, 11 May 2023 19:42:58 +0100

On Thu, 11 May 2023 18:15:15 +0100,
Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> 
> Hi James,
> 
> On Thu, Mar 30, 2023 at 05:51:22PM +0100, James Morse wrote:
> > When stage1 translation is disabled, the SCTRL_E1.I bit controls the
> > attributes used for instruction fetch, one of the options results in a
> > non-cacheable access. A whole host of CPUs missed the FWB override
> > in this case, meaning a KVM guest could fetch stale/junk data instead of
> > instructions.
> > 
> > The workaround is to disable FWB, and do the required cache maintenance
> > instead.
> 
> I think the workaround can be to only do the required cache maintenance
> without disabling FWB. Having FWB on doesn't bring any performance
> benefits if we do the cache maintenance anyway but keeping it around may
> be useful for other reasons (e.g. KVM device pass-through using
> cacheable mappings, though not something KVM supports currently).

But you'd also rely on the guest doing its own cache maintenance for
instructions it writes, right?

Which probably means exposing a different CLIDR_EL1 so that
LoC/LoUU/LoUIS are consistent with *not* having FWB... I also wonder
if keeping FWB set has the potential to change the semantics of the
CMOs (the spec seems silent on that front).

> > Unfortunately, no-one has firmware that supports this new interface yet,
> > and the least surprising thing to do is to enable the workaround by default,
> > meaning FWB is disabled on all these cores, even for unaffected platforms.
> > ACPI Platforms that are not-affected can either take a firmware-update to
> > support the interface, or if the kernel they run will only run on hardware
> > that is unaffected, disable the workaround at build time.
> 
> Given that we know of more platforms that are _not_ affected and vendors
> are pretty vague on whether they need this, I'd rather swap the default
> and keep the workaround off with a firmware interface, DT or command
> line opt-in.

That'd be my preferred way. I really dislike putting the onus on
working systems to declare themselves safe (although there are some
Spectre-shaped precedents to that).

> That said, maybe we can reduce the risk further by doing the
> vcpu_has_run_once() trick with !FWB and clean the D side to PoC on a
> stage 2 exec fault (together with the I-cache invalidation). We can then
> ignore any other cache maintenance on S2 faults until someone shouts (we
> can maybe recommend forcing FWB off on the command line through the
> cpuid override).

You lost me here with your vcpu_has_run_once().

Keeping the CMOs on exec fault is definitely manageable. But is that
enough?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.