Hi Alex, On Mon, 04 Oct 2021 12:23:41 +0100, Alexandru Elisei <alexandru.elisei@xxxxxxx> wrote: > > Hi Marc, > > On 9/24/21 09:25, Marc Zyngier wrote: > > The infamous M1 has a feature nobody else ever implemented, > > in the form of the "GIC locally generated SError interrupts", > > also known as SEIS for short. > > > > These SErrors are generated when a guest does something that violates > > the GIC state machine. It would have been simpler to just *ignore* > > the damned thing, but that's not what this HW does. Oh well. > > > > This part of of the architecture is also amazingly under-specified. > > There is a whole 10 lines that describe the feature in a spec that > > is 930 pages long, and some of these lines are factually wrong. > > Oh, and it is deprecated, so the insentive to clarify it is low. > > > > Now, the spec says that this should be a *virtual* SError when > > HCR_EL2.AMO is set. As it turns out, that's not always the case > > on this CPU, and the SError sometimes fires on the host as a > > physical SError. Goodbye, cruel world. This clearly is a HW bug, > > and it means that a guest can easily take the host down, on demand. > > > > Thankfully, we have seen systems that were just as broken in the > > past, and we have the perfect vaccine for it. > > > > Apple M1, please meet the Cavium ThunderX workaround. All your > > GIC accesses will be trapped, sanitised, and emulated. Only the > > signalling aspect of the HW will be used. It won't be super speedy, > > but it will at least be safe. You're most welcome. > > > > Given that this has only ever been seen on this single implementation, > > that the spec is unclear at best and that we cannot trust it to ever > > be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS > > being set. > > I grepped for system error in Arm IHI 0069F, and turns out there's a number of > ways to make the GIC generate one: > > - When programming the ITS > > - On a write to ICC_DIR_EL1 (or the corresponding virtual CPU interface register) > with split priority drop/interrupt deactivation is not enabled. > > - On a write to GICV_AEOIR or GICC_DIR. > > ITS and the legacy GICv2 interface is memory mapped, so I am going > to trust that KVM emulates that correctly and avoids putting the GIC > into a state that triggers the SErrors. And to be clear, if the host kernel was doing the wrong thing, it would take a *physical* SError. And on the M1, it really doesn't matter as there is no physical GIC. > The CPU interface registers are accessed directly by the guest, then > changing that to trap-and-emulate looks like the only way to avoid > the guest from crashing the host with an SError. > > As for making the trap-and-emulate depend on the ICH_VTR_EL2.SEIS > being set, that sounds reasonable to me, considering that there were > no reports so far of this being implemented. And if it turns out > that there are device which implement GIC generated SErrors > *correctly* and the trap-and-emulate cost is too much, then we can > always get an errata number from Apple and have the trapping depend > on that, right? I have very little hope that we can get Apple to give us anything here. The CPU doesn't even advertise that it has a vGIC, so we're in uncharted territories. But we could definitely key that on the MIDR. > Reviewed-by: Alexandru Elisei <alexandru.elisei@xxxxxxx> Thanks! M. -- Without deviation from the norm, progress is not possible.