Re: [PATCH for-next 2/2] IB/hfi1: Make Unsupported Request error non-fatal

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dennis,

On Wed, Apr 10, 2019 at 05:35:01AM -0700, Dennis Dalessandro wrote:
> From: Kamenee Arumugam <kamenee.arumugam@xxxxxxxxx>
> 
> For hfi1, the unsupported request error is not considered a fatal
> error. When the PCIe advanced error reporting capability (AER) is
> configured to report unsupported requests as fatal, the system will
> hang on this error.

I know there are a few drivers that fiddle with AER bits, but that
makes me a little bit nervous because error handling is more than just
a driver issue.  It involves the PCI core and the platform firmware as
well.

Anyway, let's figure out more about this particular case.  Unsupported
Request is a PCIe protocol-level issue.  You're masking it in the HFI
adapter, which I guess means you want to prevent it from reporting UR.
So the HFI is receiving a TLP that it doesn't support?

What exactly is causing the UR?  Is it something the driver could
potentially avoid, e.g., an AtomicOp that HFI doesn't support?  I have
a vague notion that InfiniBand allows some sort of direct user-space
access to hardware; is there something there that can cause a UR?

The system hang sounds like a separate problem that should also be
fixed.  Even if HFI signals a UR error, I would not expect a system
hang.

Bjorn

> Set Unsupported Request Error bit in Uncorrectable Error Mask
> register to disable error reporting to the PCIe root complex.
> 
> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx>
> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@xxxxxxxxx>
> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx>
> ---
>  drivers/infiniband/hw/hfi1/pcie.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
> index c96d193..a033e28 100644
> --- a/drivers/infiniband/hw/hfi1/pcie.c
> +++ b/drivers/infiniband/hw/hfi1/pcie.c
> @@ -114,6 +114,7 @@ int hfi1_pcie_init(struct hfi1_devdata *dd)
>  	}
>  
>  	pci_set_master(pdev);
> +	pcie_aer_set_dword(pdev, PCI_ERR_UNCOR_MASK, PCI_ERR_UNC_UNSUP);
>  	(void)pci_enable_pcie_error_reporting(pdev);
>  	return 0;
>  
> 



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux