GHES/AER synchronization missing?

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Fri, 1 Sep 2023 17:57:55 -0500

TL;DR: I think ghes_handle_aer() lacks synchronization with
aer_recover_work_func(), so aer_recover_work_func() may use estatus
data after it's been overwritten.

Sorry this is so long; it took me a long time to get this far, and I
might be in the weeds.  Here's the execution path I'm looking at:

  ghes_proc(struct ghes *ghes)
    estatus = ghes->estatus          # linux kernel buffer
    ghes_read_estatus(estatus, &buf_paddr)          # copy fw mem to estatus
    ghes_do_proc(estatus)
      apei_estatus_for_each_section(estatus, gdata)
        if (gdata is CPER_SEC_PCIE)
          ghes_handle_aer(gdata)     # pointer into estatus
            struct cper_sec_pcie *pcie_err = acpi_hest_get_payload(gdata)
            aer_recover_queue(..., pcie_err->aer_info)
              entry.regs = aer_regs  # pointer to struct aer_capability_regs
              kfifo_in(&aer_recover_ring, &entry)   # copy pointer into FIFO
  ...
  aer_recover_work_func
    kfifo_get(&aer_recover_ring, &entry)
    cper_print_aer(entry.regs)       # use aer_capability_regs values

I'm confused because I don't see what ensures that the
aer_capability_regs values, which I think are somewhere in the
ghes->estatus buffer, are preserved until aer_recover_work_func() is
finished with them.

Here's my understanding of the general flow:

  - hest_parse_ghes() adds a GHES platform device for each HEST Error
    Source descriptor of type 9 (Generic Hardware Error Source) or
    type 10 (Generic Hardware Error Source version 2).

  - Each HEST GHES entry has an Error Status Address that tells us
    about some range of firmware reserved memory that will contain
    error status data for the device.

  - ghes_probe() claims each GHES platform device.  It maps the Error
    Status Address once (so I guess the address of the firmware memory
    must be fixed for the life of the system?) and allocates a
    ghes->estatus buffer in kernel memory.

  - When the platform notifies OSPM of a GHES event, ghes_proc()
    copies error status data from the Error Status Address firmware
    memory to the ghes->estatus buffer.

  - The error status data may have multiple sections.  ghes_do_proc()
    iterates through each section in the ghes->estatus buffer.  PCIe
    sections contain a struct aer_capability_regs that has values of
    all the AER Capability registers, and ghes_handle_aer() passes a
    pointer to the struct aer_capability_regs to aer_recover_queue().

  - This struct aer_capability_regs pointer is a pointer into the
    ghes->estatus buffer.  aer_recover_queue() copies that pointer
    into the aer_recover_ring fifo and schedules
    aer_recover_work_func() for later execution.

  - aer_recover_work_func() reads the struct aer_capability_regs data
    at some future time.

  - ghes_proc() does not know when aer_recover_work_func() is finished
    with the struct aer_capability_regs data.

Am I missing a mechanism that prevents a second ghes_proc() invocation
from overwriting ghes->estatus before the first aer_recover_work_func()
is finished?

The ghes_defer_non_standard_event() case added by Shiju and James in
9aa9cf3ee945 ("ACPI / APEI: Add a notifier chain for unknown (vendor)
CPER records") also schedules future work, but it copies the data
needed for that work.  It seems like ghes_handle_aer() maybe should do
something similar?

Bjorn