On 11/15/2024 3:35 AM, Lukas Wunner wrote: > On Wed, Nov 13, 2024 at 03:54:20PM -0600, Terry Bowman wrote: >> The AER service driver's aer_get_device_error_info() function doesn't read >> uncorrectable (UCE) fatal error status from PCIe upstream port devices, >> including CXL upstream switch ports. As a result, fatal errors are not >> logged or handled as needed for CXL PCIe upstream switch port devices. >> >> Update the aer_get_device_error_info() function to read the UCE fatal >> status for all CXL PCIe port devices. Make the change to not affect >> non-CXL PCIe devices. >> >> The fatal error status will be used in future patches implementing >> CXL PCIe port uncorrectable error handling and logging. > [...] >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -1250,7 +1250,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) >> } else if (type == PCI_EXP_TYPE_ROOT_PORT || >> type == PCI_EXP_TYPE_RC_EC || >> type == PCI_EXP_TYPE_DOWNSTREAM || >> - info->severity == AER_NONFATAL) { >> + info->severity == AER_NONFATAL || >> + (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) { >> >> /* Link is still healthy for IO reads */ >> pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS, > Just a heads-up, there's another patch pending by Shuai Xue (+cc) > which touches the same code lines. It re-enables error reporting > for PCIe Upstream Ports (as well as Endpoints) under certain > conditions: > > https://lore.kernel.org/all/20241112135419.59491-3-xueshuai@xxxxxxxxxxxxxxxxx/ > > That was originally disabled by Keith Busch (+cc) with commit > 9d938ea53b26 ("PCI/AER: Don't read upstream ports below fatal errors"). > > There's some merge conflict potential here if your series goes into > the cxl tree and Shuai's patch into the pci tree in the next cycle. > > Thanks, > > Lukas Thanks Lukas I took a look at the patchset and reached out to Shuai (you're CC'd). Sorry, I thought I responded here earlier. Regards, Terry