On Thu, Mar 20, 2025 at 04:14:04PM +0100, Karolina Stolarek wrote: > On 19/03/2025 23:21, Bjorn Helgaas wrote: > > On Mon, Mar 17, 2025 at 10:14:46AM +0000, Karolina Stolarek wrote: > > > Make CXL devices use aer_print_error() when reporting AER errors. > > > Add a helper function to populate aer_err_info struct before logging > > > an error. Move struct aer_err_info definition to the aer.h header > > > to make it visible to CXL. > > > > Previously, pci_print_aer() was used by both CXL (via > > cxl_handle_rdport_errors()) and by ACPI GHES via aer_recover_queue() > > and aer_recover_work_func(), right? > > > > And after this patch, they would use aer_print_error() like native > > AER, native DPC, and the ACPI EDR DPC path? > > That is correct. > > I think this consolidation is a good thing, because I don't think we > > should log errors differently just because we learned about them via a > > different path. > > > > But I think this also changes the text we put in dmesg, which is > > potentially disruptive to users and scripts that consume it, so I > > think we should include a comparison of the previous and new text in > > the commit log. > > Like I said in a comment to the patch, I tested CXL error reporting in QEMU > with and without my patch, and the output is the same: > > pcieport 0000:0c:00.0: aer_inject: Injecting errors 00004000/00000000 into device 0000:0c:00.0 > pcieport 0000:0c:00.0: AER: Correctable error message received from 0000:0c:00.0 > pcieport 0000:0c:00.0: CXL Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID) > pcieport 0000:0c:00.0: device [8086:7075] error status/mask=00004000/0000a000 > pcieport 0000:0c:00.0: [14] CorrIntErr Maybe there's CXL magic that I missed. It looks like Terry's series changes some of this path. And GHES also currently uses pci_print_aer(). Some sample logs at [1,2]. Looking at v6.14-rc1, only aer_print_error() logs the "error status" string, and only pci_print_aer() logs "aer_status", "aer_layer", etc. The previous path is: pci_print_aer pci_err("aer_status: 0x%08x, aer_mask: 0x%08x\n") <-- __aer_print_error pci_err("aer_layer=%s, aer_agent=%s\n") <-- pcie_print_tlp_log New path is: aer_print_error pci_printk("PCIe Bus Error: severity=%s, type=%s, (%s)\n") pci_printk(" device [%04x:%04x] error status/mask=%08x/%08x\n) __aer_print_error pcie_print_tlp_log So I expected that the lines I marked in pci_print_aer() would be different. Bjorn [1] https://lore.kernel.org/lkml/2149597.8uJZFlvqrj@xrated/T/ [2] https://lore.kernel.org/all/e8a58616-aeae-ad78-d496-6dfcef4ddcaa@xxxxxxx/T/