On Thu, 11 May 2023 23:05:35 +0100, Douglas Anderson <dianders@xxxxxxxxxxxx> wrote: > > When trying to turn on the "pseudo NMI" kernel feature in Linux, it > was discovered that all Mediatek-based Chromebooks that ever shipped > (at least ones with GICv3) had a firmware bug where they wouldn't save > certain GIC "GICR" registers properly. If a processor ever entered a > suspend/idle mode where the GICR registers lost state then they'd be > reset to their default state. > > As a result of the bug, if you try to enable "pseudo NMIs" on the > affected devices then certain interrupts will unexpectedly get > promoted to be "pseudo NMIs" and cause crashes / freezes / general > mayhem. > > ChromeOS is looking to start turning on "pseudo NMIs" in production to > make crash reports more actionable. To do so, we will release firmware > updates for at least some of the affected Mediatek Chromebooks. > However, even when we update the firmware of a Chromebook it's always > possible that a user will end up booting with old firmware. We need to > be able to detect when we're running with firmware that will crash and > burn if pseudo NMIs are enabled. > > The current plan is: > * Update the device trees of all affected Chromebooks to include the > 'mediatek,gicr-save-quirk' property. The kernel can use this to know > not to enable certain features like "pseudo NMI". NOTE: device trees > for Chromebooks are never baked into the firmware but are bundled > with the kernel. A kernel will never be configured to use "pseudo > NMIs" and be bundled with an old device tree. > * When we get a fixed firmware for one of these Chromebooks, it will > patch the device tree to remove this property. Since you're in control of distributing the FW together with the kernel, I assume you're also in control of the command line. Why can't that firmware pass the option enabling the pseudo-NMI support, dispensing ourselves from all of this? > > For some details, you can also see the public bug > <https://issuetracker.google.com/281831288> > > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx> > --- > > .../bindings/interrupt-controller/arm,gic-v3.yaml | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml > index 92117261e1e1..8c251caae537 100644 > --- a/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml > +++ b/Documentation/devicetree/bindings/interrupt-controller/arm,gic-v3.yaml > @@ -166,6 +166,12 @@ properties: > resets: > maxItems: 1 > > + mediatek,gicr-save-quirk: I think this deserves something *much* stronger that outlines what is wrong, because this is not just a quirk. This is a failure to even remotely grasp the requirements of the architecture (and to use standard, public code that would have done it correctly). Something like "mediatek,broken-save-restore-fw" would be more adequate. > + type: boolean > + description: > + Asserts that the firmware on this device has issues saving and restoring > + GICR registers when CPUs are powered off. Nit: not the the CPUs, but the GIC redistributors. Thanks, M. -- Without deviation from the norm, progress is not possible.