On Fri, Jan 11, 2019 at 7:03 AM Borislav Petkov <bp@xxxxxxxxx> wrote: > On Thu, Jan 10, 2019 at 04:01:27PM -0500, Tyler Baicar wrote: > > On Thu, Jan 10, 2019 at 1:23 PM James Morse <james.morse@xxxxxxx> wrote: > > > >> > > > >> + if (is_hest_type_generic_v2(ghes) && ghes_ack_error(ghes->generic_v2)) > > > > > > > > Since ghes_ack_error() is always prepended with this check, you could > > > > push it down into the function: > > > > > > > > ghes_ack_error(ghes) > > > > ... > > > > > > > > if (!is_hest_type_generic_v2(ghes)) > > > > return 0; > > > > > > > > and simplify the two callsites :) > > > > > > Great idea! ... > > > > > > .. huh. Turns out for ghes_proc() we discard any errors other than ENOENT from > > > ghes_read_estatus() if is_hest_type_generic_v2(). This masks EIO. > > > > > > Most of the error sources discard the result, the worst thing I can find is > > > ghes_irq_func() will return IRQ_HANDLED, instead of IRQ_NONE when we didn't > > > really handle the IRQ. They're registered as SHARED, but I don't have an example > > > of what goes wrong next. > > > > > > I think this will also stop the spurious handling code kicking in to shut it up > > > if its broken and screaming. Unlikely, but not impossible. > > > > > > Fixed in a prior patch, with Boris' suggestion, ghes_proc()s tail ends up look > > > like this: > > > ----------------------%<---------------------- > > > diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c > > > index 0321d9420b1e..8d1f9930b159 100644 > > > --- a/drivers/acpi/apei/ghes.c > > > +++ b/drivers/acpi/apei/ghes.c > > > @@ -700,18 +708,11 @@ static int ghes_proc(struct ghes *ghes) > > > > > > out: > > > ghes_clear_estatus(ghes, buf_paddr); > > > + if (rc != -ENOENT) > > > + rc_ack = ghes_ack_error(ghes); > > > > > > - if (rc == -ENOENT) > > > - return rc; > > > - > > > - /* > > > - * GHESv2 type HEST entries introduce support for error acknowledgment, > > > - * so only acknowledge the error if this support is present. > > > - */ > > > - if (is_hest_type_generic_v2(ghes)) > > > - return ghes_ack_error(ghes->generic_v2); > > > - > > > - return rc; > > > + /* If rc and rc_ack failed, return the first one */ > > > + return rc ? rc : rc_ack; > > > } > > > ----------------------%<---------------------- > > > > > > > Looks good to me, I guess there's no harm in acking invalid error status blocks. > > Err, why? If ghes_read_estatus() fails, then either there was no error populated or the error status block was invalid. If the error status block is invalid, then the kernel doesn't know what happened in hardware. I originally thought this was changing what's acked, but it's just changing the return value of ghes_proc() when ghes_read_estatus() returns -EIO. > I don't know what the firmware glue does on ARM but if I'd have to > remain logical - which is hard to do with firmware - the proper thing to > do would be this: > > rc = ghes_read_estatus(ghes, &buf_paddr); > if (rc) { > ghes_reset_hardware(); The kernel would have no way of knowing what to do here. > } > > /* clear estatus and bla bla */ > > /* Now, I'm in the success case: */ > ghes_ack_error(); > > > This way, you have the error path clear of something unexpected happened > when reading the hardware, obvious and separated. ghes_reset_hardware() > clears the registers and does the necessary steps to put the hardware in > good state again so that it can report the next error. > > And the success path simply acks the error and does possibly the same > thing. The naming of the functions is important though, to denote what > gets called when. > > This way you handle all the cases just fine. No looking at the error > type and blabla. > > Right?