Re: [PATCH 4/4] PCI/DPC: Print AER status in DPC event handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 15, 2018 at 09:32:59AM -0500, Sinan Kaya wrote:
> On 1/13/2018 8:35 PM, Keith Busch wrote:
> > On Fri, Jan 12, 2018 at 05:03:03PM -0600, Bjorn Helgaas wrote:
> >> Hi Keith,
> >>
> >> On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote:
> >>> A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL,
> >>> which prevents the AER handler from reporting the details of the
> >>> error. This patch will have the DPC driver get the AER status registers
> >>> from the downstream port that detected the uncorrectable error, and
> >>> print out additional information.
> >>>
> >>> Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx>
> >>> ---
> >>>  drivers/pci/pcie/pcie-dpc.c | 8 ++++++++
> >>>  1 file changed, 8 insertions(+)
> >>>
> >>> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
> >>> index ef71a472592c..aefc4fbdcef0 100644
> >>> --- a/drivers/pci/pcie/pcie-dpc.c
> >>> +++ b/drivers/pci/pcie/pcie-dpc.c
> >>> @@ -44,6 +44,7 @@ struct dpc_dev {
> >>>  	int			cap_pos;
> >>>  	bool			rp;
> >>>  	u32			rp_pio_status;
> >>> +	struct aer_err_info	info;
> >>>  };
> >>>  
> >>>  static const char * const rp_pio_error_string[] = {
> >>> @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work)
> >>>  	struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
> >>>  	struct pci_bus *parent = pdev->subordinate;
> >>>  
> >>> +	if (aer_get_device_error_info(pdev, &dpc->info))
> >>> +		aer_print_error(pdev, &dpc->info);
> >>
> >> I'm confused about this.  "pdev" is the DPC-enabled Port, i.e., the
> >> Port that received an ERR_FATAL or ERR_NONFATAL Message.  The Message
> >> was generated by something below "pdev", and that's where the
> >> interesting AER logging would have been done.
> >>
> >> This patch suggests that the DPC port itself would have something
> >> interesting logged in its AER capability.  Is that really true?  I see
> >> in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger
> >> Reason and the Error Source ID from the Message, but I don't see
> >> anything about logging anything in its AER registers.
> > 
> > If the trigger reason is "unmasked uncorrectable error", then the DPC
> > AER register has useful information. All other reasons would not provide
> > useful data in the AER register.
> > 
> > 
> > I would assume that aer_get_device_error_info would return false if
> > it's one of other reasons, but I can fix this up to check the trigger
> > reasoning rather than unconditionally reading the AER status
> > 
> 
> https://pcisig.com/sites/default/files/specification_documents/ECN_DPC_2012-02-09_finalized.pdf
> 
> If you look at figure 6-2, AER errors are always logged regardless
> of the DPC presence.

Figure 6-2 applies to "Errors from Table 6-2, 6-3, or 6-4".
(I suspect it should also include Table 6-5 because PCIe r4.0,
sec 6.2.4 says Table 6-5 corresponds to AER status bits).

But in any case, those tables don't include receipt of ERR_FATAL or
ERR_NONFATAL messages.  So I don't think a DPC Port should be logging
anything in its own AER registers when it receives those messages.

If a DPC Port receives ERR_FATAL or ERR_NONFATAL while DPC is enabled,
it will trigger DPC and drop the message.  If DPC were not enabled, I
think it would merely forward the message toward the Root Complex
without logging an error itself.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux