Dan Williams wrote: > [ add Boris ] [ actually add Boris ] Boris, see below, thoughts on deprecating acpi_extlog... > Bjorn Helgaas wrote: > > On Mon, May 27, 2024 at 04:43:41PM +0200, Fabio M. De Francesco wrote: > > > Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI > > > v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd(). Instead, > > > the similar ghes_do_proc() (GHES) prints to kernel log and calls > > > pci_print_aer() to report via the ftrace infrastructure. > > > > > > Add support to report the CPER PCIe Error section also via the ftrace > > > infrastructure by calling pci_print_aer() to make ELOG act consistently > > > with GHES. > > > > > > Cc: Dan Williams <dan.j.williams@xxxxxxxxx> > > > Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@xxxxxxxxxxxxxxx> > > > --- > > > drivers/acpi/acpi_extlog.c | 30 ++++++++++++++++++++++++++++++ > > > drivers/pci/pcie/aer.c | 2 +- > > > include/linux/aer.h | 13 +++++++++++-- > > > 3 files changed, 42 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c > > > index e025ae390737..007ce96f8672 100644 > > > --- a/drivers/acpi/acpi_extlog.c > > > +++ b/drivers/acpi/acpi_extlog.c > > > @@ -131,6 +131,32 @@ static int print_extlog_rcd(const char *pfx, > > > return 1; > > > } > > > > > > +static void extlog_print_pcie(struct cper_sec_pcie *pcie_err, > > > + int severity) > > > +{ > > > + struct aer_capability_regs *aer; > > > + struct pci_dev *pdev; > > > + unsigned int devfn; > > > + unsigned int bus; > > > + int aer_severity; > > > + int domain; > > > + > > > + if (pcie_err->validation_bits & CPER_PCIE_VALID_DEVICE_ID && > > > + pcie_err->validation_bits & CPER_PCIE_VALID_AER_INFO) { > > > + aer_severity = cper_severity_to_aer(severity); > > > + aer = (struct aer_capability_regs *)pcie_err->aer_info; > > > + domain = pcie_err->device_id.segment; > > > + bus = pcie_err->device_id.bus; > > > + devfn = PCI_DEVFN(pcie_err->device_id.device, > > > + pcie_err->device_id.function); > > > + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn); > > > + if (!pdev) > > > + return; > > > + pci_print_aer(pdev, aer_severity, aer); > > > + pci_dev_put(pdev); > > > + } > > > > I'm 100% in favor of making error reporting work and look the same > > across GHES and ELOG. But I do have to gripe a bit... > > > > It's already unfortunate that GHES and the native AER handling are > > separate paths that lead to the same place (__aer_print_error()). > > > > I'm sorry that we need to add a third path that again does > > fundamentally the same thing. The fact that they're separate means > > all the design, reviewing, testing, and maintenance effort is diluted, > > and error handling always gets too little love in the first place. > > I think this is a recipe for confusion. > > > > ghes_do_proc # GHES > > apei_estatus_for_each_section > > ... > > if (guid_equal(sec_type, &CPER_SEC_PCIE)) > > ghes_handle_aer > > cper_severity_to_aer > > aer_recover_queue > > kfifo_in_spinlocked(&aer_recover_ring) # add to queue > > aer_recover_work_func # another thread > > kfifo_get(&aer_recover_ring) # remove from queue > > pci_print_aer > > __aer_print_error <--- > > > > aer_irq # native AER > > kfifo_put(&aer_fifo) # add to queue > > aer_isr # another thread > > kfifo_get(&aer_fifo) # remove from queue > > ... > > aer_isr_one_error > > aer_process_err_devices > > aer_print_error > > __aer_print_error <--- > > > > extlog_print # extlog (x86 only) > > apei_estatus_for_each_section > > ... > > if (guid_equal(sec_type, &CPER_SEC_PCIE)) > > extlog_print_pcie > > cper_severity_to_aer > > pci_get_domain_bus_and_slot > > pci_print_aer > > __aer_print_error <--- > > > > And we also have CXL paths that lead to __aer_print_error(), although > > it seems like they at least start in the native AER (and maybe GHES?) > > path and branch out somewhere. My head is spinning. > > > > Do I *object* to this patch? No, not really; it's a trivial change in > > drivers/pci/, and Rafael can add my > > > > Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > > > > as needed. But I am afraid we're making ourselves a maintenance > > headache. > > To be honest, I am too. Upon discovering that extlog_print() behaves > differently than ghes_do_proc(), I had the snarky thought "great, can we > now just go ahead and deprecate the extlog path, it's just a source of > maintenance pain.". > > So *if*we keep acpi_extlog it then I definitely think it should be > consistent with other CPER handlers (needs this patch). But, I am also > open to entertaining "deprecate it". > > To me, the fact that ras_userspace_consumers() is only honored for > acpi_extlog is clear evidence that the kernel has already painted itself > into a confusing user ABI corner and maybe the proper path forward at > this point is to cut acpi_extlog loose.