Hi Dan, On 10/21/24 20:43, Dan Williams wrote: > Terry Bowman wrote: > [..] >> Testing: >> >> Below are test results for this patchset. This is using Qemu with a root >> port (0c:00.0), upstream switch port (0d:00.0),and downstream switch port >> (0e:00.0). >> >> This was tested using aer-inject updated to support CE and UCE internal >> error injection. CXL RAS was set using a test patch (not upstreamed). > > Thanks for these test outputs! > >> >> Root port UCE: >> root@tbowman-cxl:~/aer-inject# ./root-uce-inject.sh >> [ 27.318920] pcieport 0000:0c:00.0: aer_inject: Injecting errors 00000000/00400000 into device 0000:0c:00.0 >> [ 27.320164] pcieport 0000:0c:00.0: AER: Uncorrectable (Fatal) error message received from 0000:0c:00.0 >> [ 27.321518] pcieport 0000:0c:00.0: PCIe Bus Error: severity=Uncorrectable (Fatal), type=Transaction Layer, (Receiver ID) >> [ 27.322483] pcieport 0000:0c:00.0: device [8086:7075] error status/mask=00400000/02000000 >> [ 27.323243] pcieport 0000:0c:00.0: [22] UncorrIntErr >> [ 27.325584] aer_event: 0000:0c:00.0 PCIe Bus Error: severity=Fatal, Uncorrectable Internal Error, TLP Header=Not available > > It strikes that by this point the code knows that it is a "CXL Bus" > error and no longer a "PCIe Bus" error. Given the divergent responses > to Fatal errors based on bus I think it would help to clarify that the > kernel is panicking due to "CXL Bus", not "PCIe Bus" errors. > >> [ 27.325584] >> [ 27.327171] cxl_port_aer_uncorrectable_error: device=0000:0c:00.0 host=pci0000:0c status: 'Memory Address Parity Error' > > ...i.e. someone may not notice that this is "cxl" reference in the > backtrace. Good idea. I'll add logic to print 'CXL' bus in the case of a CXL erroring device. Regards, Terry