On Mon, Jan 15, 2018 at 09:32:59AM -0500, Sinan Kaya wrote: > On 1/13/2018 8:35 PM, Keith Busch wrote: > > On Fri, Jan 12, 2018 at 05:03:03PM -0600, Bjorn Helgaas wrote: > >> Hi Keith, > >> > >> On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote: > >>> A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, > >>> which prevents the AER handler from reporting the details of the > >>> error. This patch will have the DPC driver get the AER status registers > >>> from the downstream port that detected the uncorrectable error, and > >>> print out additional information. > >>> > >>> Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx> > >>> --- > >>> drivers/pci/pcie/pcie-dpc.c | 8 ++++++++ > >>> 1 file changed, 8 insertions(+) > >>> > >>> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c > >>> index ef71a472592c..aefc4fbdcef0 100644 > >>> --- a/drivers/pci/pcie/pcie-dpc.c > >>> +++ b/drivers/pci/pcie/pcie-dpc.c > >>> @@ -44,6 +44,7 @@ struct dpc_dev { > >>> int cap_pos; > >>> bool rp; > >>> u32 rp_pio_status; > >>> + struct aer_err_info info; > >>> }; > >>> > >>> static const char * const rp_pio_error_string[] = { > >>> @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work) > >>> struct pci_dev *dev, *temp, *pdev = dpc->dev->port; > >>> struct pci_bus *parent = pdev->subordinate; > >>> > >>> + if (aer_get_device_error_info(pdev, &dpc->info)) > >>> + aer_print_error(pdev, &dpc->info); > >> > >> I'm confused about this. "pdev" is the DPC-enabled Port, i.e., the > >> Port that received an ERR_FATAL or ERR_NONFATAL Message. The Message > >> was generated by something below "pdev", and that's where the > >> interesting AER logging would have been done. > >> > >> This patch suggests that the DPC port itself would have something > >> interesting logged in its AER capability. Is that really true? I see > >> in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger > >> Reason and the Error Source ID from the Message, but I don't see > >> anything about logging anything in its AER registers. > > > > If the trigger reason is "unmasked uncorrectable error", then the DPC > > AER register has useful information. All other reasons would not provide > > useful data in the AER register. > > > > > > I would assume that aer_get_device_error_info would return false if > > it's one of other reasons, but I can fix this up to check the trigger > > reasoning rather than unconditionally reading the AER status > > > > https://pcisig.com/sites/default/files/specification_documents/ECN_DPC_2012-02-09_finalized.pdf > > If you look at figure 6-2, AER errors are always logged regardless > of the DPC presence. Figure 6-2 applies to "Errors from Table 6-2, 6-3, or 6-4". (I suspect it should also include Table 6-5 because PCIe r4.0, sec 6.2.4 says Table 6-5 corresponds to AER status bits). But in any case, those tables don't include receipt of ERR_FATAL or ERR_NONFATAL messages. So I don't think a DPC Port should be logging anything in its own AER registers when it receives those messages. If a DPC Port receives ERR_FATAL or ERR_NONFATAL while DPC is enabled, it will trigger DPC and drop the message. If DPC were not enabled, I think it would merely forward the message toward the Root Complex without logging an error itself.