Re: [PATCH v5 06/16] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe Port devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 28 Jan 2025 14:25:54 -0600
"Bowman, Terry" <terry.bowman@xxxxxxx> wrote:

> On 1/14/2025 5:32 AM, Jonathan Cameron wrote:
> > On Tue, 7 Jan 2025 08:38:42 -0600
> > Terry Bowman <terry.bowman@xxxxxxx> wrote:
> >  
> >> The AER service driver's aer_get_device_error_info() function doesn't read
> >> uncorrectable (UCE) fatal error status from PCIe Upstream Port devices,
> >> including CXL Upstream Switch Ports. As a result, fatal errors are not
> >> logged or handled as needed for CXL PCIe Upstream Switch Port devices.
> >>
> >> Update the aer_get_device_error_info() function to read the UCE fatal
> >> status for all CXL PCIe devices. Make the change such that non-CXL devices
> >> are not affected.
> >>
> >> The fatal error status will be used in future patches implementing
> >> CXL PCIe Port uncorrectable error handling and logging.
> >>
> >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>  
> > This clashes with Shuai's series adding link healthy checks.
> > Maybe we can reuse that logic to incorporate the condition we
> > care about here?
> >  
> 
> Hi Jonathan, et. al,
> 
> After looking at this closer and considering the situation I believe
> we should remove this patch from the patchset and defer adding these
> changes to log USP AER and RAS UCE.
> 
> I propose we reintroduce this later as a RFC or RFT in a future patchset.
> This will give more needed time for testing.
> 
> The only downside to adding later is in the case of CXL USP fatal UCE. AER and
> RAS will not be logged but this was the AER driver's existing behavior and as a
> result isn't a regression.

If we have doubts and it is complex then sure. Let's do this in stages.

Jonathan

> 
> Your thoughts?
> 
> Regards,
> Terry
> 
> >> ---
> >>  drivers/pci/pcie/aer.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> >> index 62be599e3bee..79c828bdcb6d 100644
> >> --- a/drivers/pci/pcie/aer.c
> >> +++ b/drivers/pci/pcie/aer.c
> >> @@ -1253,7 +1253,8 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
> >>  	} else if (type == PCI_EXP_TYPE_ROOT_PORT ||
> >>  		   type == PCI_EXP_TYPE_RC_EC ||
> >>  		   type == PCI_EXP_TYPE_DOWNSTREAM ||
> >> -		   info->severity == AER_NONFATAL) {
> >> +		   info->severity == AER_NONFATAL ||
> >> +		   (pcie_is_cxl(dev) && type == PCI_EXP_TYPE_UPSTREAM)) {
> >>  
> >>  		/* Link is still healthy for IO reads */
> >>  		pci_read_config_dword(dev, aer + PCI_ERR_UNCOR_STATUS,  
> 
> 





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux