On Fri, Jan 12, 2018 at 05:03:03PM -0600, Bjorn Helgaas wrote: > Hi Keith, > > On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote: > > A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, > > which prevents the AER handler from reporting the details of the > > error. This patch will have the DPC driver get the AER status registers > > from the downstream port that detected the uncorrectable error, and > > print out additional information. > > > > Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> > > --- > > drivers/pci/pcie/pcie-dpc.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c > > index ef71a472592c..aefc4fbdcef0 100644 > > --- a/drivers/pci/pcie/pcie-dpc.c > > +++ b/drivers/pci/pcie/pcie-dpc.c > > @@ -44,6 +44,7 @@ struct dpc_dev { > > int cap_pos; > > bool rp; > > u32 rp_pio_status; > > + struct aer_err_info info; > > }; > > > > static const char * const rp_pio_error_string[] = { > > @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work) > > struct pci_dev *dev, *temp, *pdev = dpc->dev->port; > > struct pci_bus *parent = pdev->subordinate; > > > > + if (aer_get_device_error_info(pdev, &dpc->info)) > > + aer_print_error(pdev, &dpc->info); > > I'm confused about this. "pdev" is the DPC-enabled Port, i.e., the > Port that received an ERR_FATAL or ERR_NONFATAL Message. The Message > was generated by something below "pdev", and that's where the > interesting AER logging would have been done. > > This patch suggests that the DPC port itself would have something > interesting logged in its AER capability. Is that really true? I see > in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger > Reason and the Error Source ID from the Message, but I don't see > anything about logging anything in its AER registers. If the trigger reason is "unmasked uncorrectable error", then the DPC AER register has useful information. All other reasons would not provide useful data in the AER register. I would assume that aer_get_device_error_info would return false if it's one of other reasons, but I can fix this up to check the trigger reasoning rather than unconditionally reading the AER status