On Tue, 28 Jan 2025 14:25:54 -0600 "Bowman, Terry" <terry.bowman@xxxxxxx> wrote: > On 1/14/2025 5:32 AM, Jonathan Cameron wrote: > > On Tue, 7 Jan 2025 08:38:42 -0600 > > Terry Bowman <terry.bowman@xxxxxxx> wrote: > > > >> The AER service driver's aer_get_device_error_info() function doesn't read > >> uncorrectable (UCE) fatal error status from PCIe Upstream Port devices, > >> including CXL Upstream Switch Ports. As a result, fatal errors are not > >> logged or handled as needed for CXL PCIe Upstream Switch Port devices. > >> > >> Update the aer_get_device_error_info() function to read the UCE fatal > >> status for all CXL PCIe devices. Make the change such that non-CXL devices > >> are not affected. > >> > >> The fatal error status will be used in future patches implementing > >> CXL PCIe Port uncorrectable error handling and logging. > >> > >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx> > > This clashes with Shuai's series adding link healthy checks. > > Maybe we can reuse that logic to incorporate the condition we > > care about here? > > > > Hi Jonathan, et. al, > > After looking at this closer and considering the situation I believe > we should remove this patch from the patchset and defer adding these > changes to log USP AER and RAS UCE. > > I propose we reintroduce this later as a RFC or RFT in a future patchset. > This will give more needed time for testing. > > The only downside to adding later is in the case of CXL USP fatal UCE. AER and > RAS will not be logged but this was the AER driver's existing behavior and as a > result isn't a regression. If we have doubts and it is complex then sure. Let's do this in stages. Jonathan > > Your thoughts? > > Regards, > Terry > > >> --- > >> drivers/pci/pcie/aer.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > >> index 62be599e3bee..79c828bdcb6d 100644 > >> --- a/drivers/pci/pcie/aer.c > >> +++ b/drivers/pci/pcie/aer.c > >> @@ -1253,7 +1253,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) > >> } else if (type == PCI_EXP_TYPE_ROOT_PORT || > >> type == PCI_EXP_TYPE_RC_EC || > >> type == PCI_EXP_TYPE_DOWNSTREAM || > >> - info->severity == AER_NONFATAL) { > >> + info->severity == AER_NONFATAL || > >> + (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) { > >> > >> /* Link is still healthy for IO reads */ > >> pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, > >