On 1/13/2018 8:35 PM, Keith Busch wrote: > On Fri, Jan 12, 2018 at 05:03:03PM -0600, Bjorn Helgaas wrote: >> Hi Keith, >> >> On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote: >>> A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, >>> which prevents the AER handler from reporting the details of the >>> error. This patch will have the DPC driver get the AER status registers >>> from the downstream port that detected the uncorrectable error, and >>> print out additional information. >>> >>> Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> >>> --- >>> drivers/pci/pcie/pcie-dpc.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c >>> index ef71a472592c..aefc4fbdcef0 100644 >>> --- a/drivers/pci/pcie/pcie-dpc.c >>> +++ b/drivers/pci/pcie/pcie-dpc.c >>> @@ -44,6 +44,7 @@ struct dpc_dev { >>> int cap_pos; >>> bool rp; >>> u32 rp_pio_status; >>> + struct aer_err_info info; >>> }; >>> >>> static const char * const rp_pio_error_string[] = { >>> @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work) >>> struct pci_dev *dev, *temp, *pdev = dpc->dev->port; >>> struct pci_bus *parent = pdev->subordinate; >>> >>> + if (aer_get_device_error_info(pdev, &dpc->info)) >>> + aer_print_error(pdev, &dpc->info); >> >> I'm confused about this. "pdev" is the DPC-enabled Port, i.e., the >> Port that received an ERR_FATAL or ERR_NONFATAL Message. The Message >> was generated by something below "pdev", and that's where the >> interesting AER logging would have been done. >> >> This patch suggests that the DPC port itself would have something >> interesting logged in its AER capability. Is that really true? I see >> in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger >> Reason and the Error Source ID from the Message, but I don't see >> anything about logging anything in its AER registers. > > If the trigger reason is "unmasked uncorrectable error", then the DPC > AER register has useful information. All other reasons would not provide > useful data in the AER register. > > > I would assume that aer_get_device_error_info would return false if > it's one of other reasons, but I can fix this up to check the trigger > reasoning rather than unconditionally reading the AER status > https://pcisig.com/sites/default/files/specification_documents/ECN_DPC_2012-02-09_finalized.pdf If you look at figure 6-2, AER errors are always logged regardless of the DPC presence. That's why, I was suggesting Keith that code should also clear the AER information when such an event happens. Otherwise, same event won't trigger again. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.