Some new devices such as CXL devices may want to record additional error information on a corrected error. Add a callback to allow the PCI device driver to do additional logging such as providing additional stats for user space RAS monitoring. For CXL device, this is actually a need due to CXL needing to write to the device AER status register in order to clear the unmasked CEs. Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> Suggested-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> Signed-off-by: Dave Jiang <dave.jiang@xxxxxxxxx> --- Documentation/PCI/pci-error-recovery.rst | 7 +++++++ drivers/pci/pcie/aer.c | 8 +++++++- include/linux/pci.h | 3 +++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst index 187f43a03200..690220255d5e 100644 --- a/Documentation/PCI/pci-error-recovery.rst +++ b/Documentation/PCI/pci-error-recovery.rst @@ -83,6 +83,7 @@ This structure has the form:: int (*mmio_enabled)(struct pci_dev *dev); int (*slot_reset)(struct pci_dev *dev); void (*resume)(struct pci_dev *dev); + void (*cor_error_log)(struct pci_dev *dev); }; The possible channel states are:: @@ -422,5 +423,11 @@ That is, the recovery API only requires that: - drivers/net/cxgb3 - drivers/net/s2io.c + The cor_error_log() callback is invoked in handle_error_source() when + the error severity is "correctable". The callback is optional and allows + additional logging to be done if desired. See example: + + - drivers/cxl/pci.c + The End ------- diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index e2d8a74f83c3..af1b5eecbb11 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -961,8 +961,14 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) if (aer) pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, info->status); - if (pcie_aer_is_native(dev)) + if (pcie_aer_is_native(dev)) { + struct pci_driver *pdrv = dev->driver; + + if (pdrv && pdrv->err_handler && + pdrv->err_handler->cor_error_log) + pdrv->err_handler->cor_error_log(dev); pcie_clear_device_status(dev); + } } else if (info->severity == AER_NONFATAL) pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); else if (info->severity == AER_FATAL) diff --git a/include/linux/pci.h b/include/linux/pci.h index 575849a100a3..54939b3426a9 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -844,6 +844,9 @@ struct pci_error_handlers { /* Device driver may resume normal operations */ void (*resume)(struct pci_dev *dev); + + /* Allow device driver to record more details of a correctable error */ + void (*cor_error_log)(struct pci_dev *dev); };