Re: [PATCH] PCI/AER: Consolidate CXL and native AER reporting paths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 20, 2025 at 04:14:04PM +0100, Karolina Stolarek wrote:
> On 19/03/2025 23:21, Bjorn Helgaas wrote:
> > On Mon, Mar 17, 2025 at 10:14:46AM +0000, Karolina Stolarek wrote:
> > > Make CXL devices use aer_print_error() when reporting AER errors.
> > > Add a helper function to populate aer_err_info struct before logging
> > > an error. Move struct aer_err_info definition to the aer.h header
> > > to make it visible to CXL.
> > 
> > Previously, pci_print_aer() was used by both CXL (via
> > cxl_handle_rdport_errors()) and by ACPI GHES via aer_recover_queue()
> > and aer_recover_work_func(), right?
> > 
> > And after this patch, they would use aer_print_error() like native
> > AER, native DPC, and the ACPI EDR DPC path?
> 
> That is correct.

> > I think this consolidation is a good thing, because I don't think we
> > should log errors differently just because we learned about them via a
> > different path.
> > 
> > But I think this also changes the text we put in dmesg, which is
> > potentially disruptive to users and scripts that consume it, so I
> > think we should include a comparison of the previous and new text in
> > the commit log.
> 
> Like I said in a comment to the patch, I tested CXL error reporting in QEMU
> with and without my patch, and the output is the same:
> 
> pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0c:00.0
> pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0c:00.0
> pcieport 0000:0c:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID)
> pcieport 0000:0c:00.0:   device [8086:7075] error status/mask=00004000/0000a000
> pcieport 0000:0c:00.0:    [14] CorrIntErr

Maybe there's CXL magic that I missed.  It looks like Terry's series
changes some of this path.  And GHES also currently uses
pci_print_aer().  Some sample logs at [1,2].

Looking at v6.14-rc1, only aer_print_error() logs the "error status"
string, and only pci_print_aer() logs "aer_status", "aer_layer", etc.

The previous path is:

  pci_print_aer
    pci_err("aer_status: 0x%08x, aer_mask: 0x%08x\n")    <--
    __aer_print_error
    pci_err("aer_layer=%s, aer_agent=%s\n")              <--
    pcie_print_tlp_log

New path is:

  aer_print_error
    pci_printk("PCIe Bus Error: severity=%s, type=%s, (%s)\n")
    pci_printk("  device [%04x:%04x] error status/mask=%08x/%08x\n)
    __aer_print_error
    pcie_print_tlp_log

So I expected that the lines I marked in pci_print_aer() would be
different.

Bjorn

[1] https://lore.kernel.org/lkml/2149597.8uJZFlvqrj@xrated/T/
[2] https://lore.kernel.org/all/e8a58616-aeae-ad78-d496-6dfcef4ddcaa@xxxxxxx/T/




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux