Re: [PATCH v7 10/25] ACPI / APEI: Tell firmware the estatus queue consumed the records

Tyler Baicar <baicar.tyler@xxxxxxxxx> · Fri, 11 Jan 2019 15:53:56 -0500

On Fri, Jan 11, 2019 at 1:09 PM James Morse <james.morse@xxxxxxx> wrote:
> On 11/01/2019 15:32, Tyler Baicar wrote:
> > On Fri, Jan 11, 2019 at 7:03 AM Borislav Petkov <bp@xxxxxxxxx> wrote:
> >> On Thu, Jan 10, 2019 at 04:01:27PM -0500, Tyler Baicar wrote:
> >>> On Thu, Jan 10, 2019 at 1:23 PM James Morse <james.morse@xxxxxxx> wrote:
> >>>>>>
> >>>>>> +    if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2))
> >>>>>
> >>>>> Since ghes_ack_error() is always prepended with this check, you could
> >>>>> push it down into the function:
> >>>>>
> >>>>> ghes_ack_error(ghes)
> >>>>> ...
> >>>>>
> >>>>>       if (!is_hest_type_generic_v2(ghes))
> >>>>>               return 0;
> >>>>>
> >>>>> and simplify the two callsites :)
> >>>>
> >>>> Great idea! ...
> >>>>
> >>>> .. huh. Turns out for ghes_proc() we discard any errors other than ENOENT from
> >>>> ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO.
> >>>>
> >>>> Most of the error sources discard the result, the worst thing I can find is
> >>>> ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't
> >>>> really handle the IRQ. They're registered as SHARED, but I don't have an example
> >>>> of what goes wrong next.
> >>>>
> >>>> I think this will also stop the spurious handling code kicking in to shut it up
> >>>> if its broken and screaming. Unlikely, but not impossible.
>
> [....]
>
> >>> Looks good to me, I guess there's no harm in acking invalid error status blocks.
>
> Great, I didn't miss something nasty...
>
>
> >> Err, why?
> >
> > If ghes_read_estatus() fails, then either there was no error populated or the
> > error status block was invalid.
> > If the error status block is invalid, then the kernel doesn't know what happened
> > in hardware.
>
> What do we mean by 'hardware' here? We're receiving a corrupt report of
> something via memory.

By Hardware here I meant whatever hardware was reporting the error.

> The GHESv2 ack just means we're done with the memory. I think it exists because
> the external-agent can't peek into the CPU to see if its returned from the
> notification.
>
>
> > I originally thought this was changing what's acked, but it's just changing the
> > return value of ghes_proc() when ghes_read_estatus() returns -EIO.
>
> Sorry, that will be due to my bad description.
>
>
> >> I don't know what the firmware glue does on ARM but if I'd have to
> >> remain logical - which is hard to do with firmware - the proper thing to
> >> do would be this:
> >>
> >>         rc = ghes_read_estatus(ghes, &buf_paddr);
> >>         if (rc) {
> >>                 ghes_reset_hardware();
> >
> > The kernel would have no way of knowing what to do here.
>
> Is there anything wrong with what we do today? We stamp on the records so that
> we don't processes them again. (especially if is polled), and we tell firmware
> it can re-use this memory.
>
> (I think we should return an error, or print a ratelimited warning for corrupt
> records)

Agree, the print is already present in ghes_read_estatus.

> >>         }
> >>
> >>         /* clear estatus and bla bla */
> >>
> >>         /* Now, I'm in the success case: */
> >>          ghes_ack_error();
> >>
> >>
> >> This way, you have the error path clear of something unexpected happened
> >> when reading the hardware, obvious and separated. ghes_reset_hardware()
> >> clears the registers and does the necessary steps to put the hardware in
> >> good state again so that it can report the next error.
> >>
> >> And the success path simply acks the error and does possibly the same
> >> thing. The naming of the functions is important though, to denote what
> >> gets called when.
>
> I think this duplicates the record-stamping/acking. If there is anything in that
> memory region, the action for processed/copied/ignored-because-its-corrupt is
> the same.
>
> We can return on ENOENT out earlier, as nothing needs doing in that case. Its
> what the GHES_TO_CLEAR spaghetti is for, we can probably move the ack thing into
> ghes_clear_estatus(), that way that thing means 'I'm done with this memory'.
>
> Something like:
> -------------------------
> rc = ghes_read_estatus();
> if (rc == -ENOENT)
>         return 0;

We still should be returning at least the -ENOENT from ghes_read_estatus().
That is being used by the SEA handling to determine if an SEA was properly
reported/handled by the host kernel in the KVM SEA case.

Here are the relevant functions:
https://elixir.bootlin.com/linux/latest/source/drivers/acpi/apei/ghes.c#L797
https://elixir.bootlin.com/linux/latest/source/arch/arm64/mm/fault.c#L723
https://elixir.bootlin.com/linux/latest/source/virt/kvm/arm/mmu.c#L1706

>
> if (!rc) {
>         ghes_do_proc() and friends;
> }
>
> ghes_clear_estatus();
>
> return rc;
> -------------------------
>
> We would no longer return errors from the ack code, I suspect that can only
> happen for a corrupt gas, which we would have caught earlier as we rely on the
> mapping being cached.