On Tue, Nov 29, 2022 at 10:49:05AM -0700, Dave Jiang wrote: > Some new devices such as CXL devices may want to record additional error > information on a corrected error. Add a callback to allow the PCI device > driver to do additional logging such as providing additional stats for user > space RAS monitoring. > > For CXL device, this is actually a need due to CXL needing to write to the > device AER status register in order to clear the unmasked CEs. s/CE/correctable error/ since it's the first use and not common in PCI-land. "device AER status register" sounds like the PCIe AER Correctable Error Status Register (PCIe r6.0, sec 7.8.4.5), but I think you mean something else, maybe a CXL-specific register? The PCIe core needs to own the AER one (PCI_ERR_COR_STATUS) so it can coordinate ownership between firmware and Linux. > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Suggested-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> > Signed-off-by: Dave Jiang <dave.jiang@xxxxxxxxx> Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > --- > Documentation/PCI/pci-error-recovery.rst | 7 +++++++ > drivers/pci/pcie/aer.c | 8 +++++++- > include/linux/pci.h | 3 +++ > 3 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst > index 187f43a03200..690220255d5e 100644 > --- a/Documentation/PCI/pci-error-recovery.rst > +++ b/Documentation/PCI/pci-error-recovery.rst > @@ -83,6 +83,7 @@ This structure has the form:: > int (*mmio_enabled)(struct pci_dev *dev); > int (*slot_reset)(struct pci_dev *dev); > void (*resume)(struct pci_dev *dev); > + void (*cor_error_log)(struct pci_dev *dev); I think I would remove "log" from the name because it suggests this hook should *only* log, and you need to actually clear some status. Maybe "cor_error_detected()" to be analogous to error_detected()? > }; > > The possible channel states are:: > @@ -422,5 +423,11 @@ That is, the recovery API only requires that: > - drivers/net/cxgb3 > - drivers/net/s2io.c > > + The cor_error_log() callback is invoked in handle_error_source() when > + the error severity is "correctable". The callback is optional and allows > + additional logging to be done if desired. See example: > + > + - drivers/cxl/pci.c > + > The End > ------- > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index e2d8a74f83c3..af1b5eecbb11 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -961,8 +961,14 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info) > if (aer) > pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, > info->status); > - if (pcie_aer_is_native(dev)) > + if (pcie_aer_is_native(dev)) { > + struct pci_driver *pdrv = dev->driver; > + > + if (pdrv && pdrv->err_handler && > + pdrv->err_handler->cor_error_log) > + pdrv->err_handler->cor_error_log(dev); > pcie_clear_device_status(dev); > + } > } else if (info->severity == AER_NONFATAL) > pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); > else if (info->severity == AER_FATAL) > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 575849a100a3..54939b3426a9 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -844,6 +844,9 @@ struct pci_error_handlers { > > /* Device driver may resume normal operations */ > void (*resume)(struct pci_dev *dev); > + > + /* Allow device driver to record more details of a correctable error */ > + void (*cor_error_log)(struct pci_dev *dev); > }; > > > >