Re: [v6 11/11 PATCH] cxl/pci: Add callback to log AER correctable error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

On 11/30/22 18:02, Dave Jiang wrote:
> Add AER error handler callback to read the RAS capability structure
> correctable error (CE) status register for the CXL device. Log the
> error as a trace event and clear the error. For CXL devices, the driver
> also needs to write back to the status register to clear the
> unmasked correctable errors.
> 
> See CXL spec rev3.0 8.2.4.16 for RAS capability structure CE Status
> Register.
> 
> Suggested-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Signed-off-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> ---
> 
> v6:
> - Update commit log to point to RAS capability structure. (Bjorn)
> - Change cxl_correctable_error_logging() to cxl_cor_error_detected().
>   (Bjorn)
> 
>  drivers/cxl/pci.c |   20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 11f842df9807..02342830b612 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -622,10 +622,30 @@ static void cxl_error_resume(struct pci_dev *pdev)
>  		 dev->driver ? "successful" : "failed");
>  }
>  
> +static void cxl_cor_error_detected(struct pci_dev *pdev)
> +{
> +	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +	struct cxl_memdev *cxlmd = cxlds->cxlmd;
> +	struct device *dev = &cxlmd->dev;
> +	void __iomem *addr;
> +	u32 status;
> +
> +	if (!cxlds->regs.ras)
> +		return;
> +
> +	addr = cxlds->regs.ras + CXL_RAS_CORRECTABLE_STATUS_OFFSET;
> +	status = le32_to_cpu(readl(addr));
> +	if (status & CXL_RAS_CORRECTABLE_STATUS_MASK) {
> +		writel(status & CXL_RAS_CORRECTABLE_STATUS_MASK, addr);
> +		trace_cxl_aer_correctable_error(dev_name(dev), status);
> +	}
> +}
> +

This will log PCI AER CEs only if there is also a RAS CE. My understanding
(could be the problem) is AER CE's are normally reported. Will this be inconsistent
with other error AER CE handling?

Regards,
Terry 

>  static const struct pci_error_handlers cxl_error_handlers = {
>  	.error_detected	= cxl_error_detected,
>  	.slot_reset	= cxl_slot_reset,
>  	.resume		= cxl_error_resume,
> +	.cor_error_detected	= cxl_cor_error_detected,
>  };
>  
>  static struct pci_driver cxl_pci_driver = {
> 
> 




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux