Re: [PATCH 4/4] PCI/DPC: Print AER status in DPC event handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/13/2018 8:35 PM, Keith Busch wrote:
> On Fri, Jan 12, 2018 at 05:03:03PM -0600, Bjorn Helgaas wrote:
>> Hi Keith,
>>
>> On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote:
>>> A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL,
>>> which prevents the AER handler from reporting the details of the
>>> error. This patch will have the DPC driver get the AER status registers
>>> from the downstream port that detected the uncorrectable error, and
>>> print out additional information.
>>>
>>> Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx>
>>> ---
>>>  drivers/pci/pcie/pcie-dpc.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c
>>> index ef71a472592c..aefc4fbdcef0 100644
>>> --- a/drivers/pci/pcie/pcie-dpc.c
>>> +++ b/drivers/pci/pcie/pcie-dpc.c
>>> @@ -44,6 +44,7 @@ struct dpc_dev {
>>>  	int			cap_pos;
>>>  	bool			rp;
>>>  	u32			rp_pio_status;
>>> +	struct aer_err_info	info;
>>>  };
>>>  
>>>  static const char * const rp_pio_error_string[] = {
>>> @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work)
>>>  	struct pci_dev *dev, *temp, *pdev = dpc->dev->port;
>>>  	struct pci_bus *parent = pdev->subordinate;
>>>  
>>> +	if (aer_get_device_error_info(pdev, &dpc->info))
>>> +		aer_print_error(pdev, &dpc->info);
>>
>> I'm confused about this.  "pdev" is the DPC-enabled Port, i.e., the
>> Port that received an ERR_FATAL or ERR_NONFATAL Message.  The Message
>> was generated by something below "pdev", and that's where the
>> interesting AER logging would have been done.
>>
>> This patch suggests that the DPC port itself would have something
>> interesting logged in its AER capability.  Is that really true?  I see
>> in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger
>> Reason and the Error Source ID from the Message, but I don't see
>> anything about logging anything in its AER registers.
> 
> If the trigger reason is "unmasked uncorrectable error", then the DPC
> AER register has useful information. All other reasons would not provide
> useful data in the AER register.
> 
> 
> I would assume that aer_get_device_error_info would return false if
> it's one of other reasons, but I can fix this up to check the trigger
> reasoning rather than unconditionally reading the AER status
> 

https://pcisig.com/sites/default/files/specification_documents/ECN_DPC_2012-02-09_finalized.pdf

If you look at figure 6-2, AER errors are always logged regardless of the DPC
presence.

That's why, I was suggesting Keith that code should also clear the AER information
when such an event happens. Otherwise, same event won't trigger again.


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux