Dear Bjorn, Thanks for reviewing. > Hello Taku-san, > > On Wed, Aug 05, 2015 at 01:42:13AM +0900, Taku Izumi wrote: > > AER uncorrectable or correctable error might be recorded > > when power on devices. These errors can be ignored, so > > BIOS usually clean up these registers ahead of OS's scanning > > devices. > > However, in case of hot-plug PCIe devices, BIOS can't care. > > What happens when we power down a device for suspend or because it's idle? > Can we get spurious AER errors when we power the device back up? This > patch only covers the enumeration path, so we'd need to do more if it can > happen during suspend/resume. Our server only supports "suspend to disk". In that case, BIOS cleans up those registers like boot-time. So It seems that no troubles are previously reported other than hot-plug case. However, if box supports suspend-to-RAM, similar problem may happen. It is true that that register clean-up should be done during suspend/resume case. Should we cover during-suspend/resume case ? > > Currently OS don't clean up AER error status registers on probing > > devices, ignorable AER errors recorded when power-on remains. > > This causes false-positive. > > > > This patch address this problem by cleaning up > > AER error status registers on probing devices. > > > > Signed-off-by: Taku Izumi <izumi.taku@xxxxxxxxxxxxxx> > > --- > > drivers/pci/pcie/aer/aerdrv_core.c | 26 ++++++++++++++++++++++++++ > > drivers/pci/probe.c | 5 +++++ > > include/linux/aer.h | 5 +++++ > > 3 files changed, 36 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c > > index 9803e3d..9857cc4 100644 > > --- a/drivers/pci/pcie/aer/aerdrv_core.c > > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > > @@ -74,6 +74,32 @@ int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev) > > } > > EXPORT_SYMBOL_GPL(pci_cleanup_aer_uncorrect_error_status); > > > > +int pci_cleanup_aer_error_status_regs(struct pci_dev *dev) > > +{ > > + int pos; > > + u32 status; > > + int port_type; > > + > > + pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); > > + if (!pos) > > + return -EIO; > > + > > + port_type = pci_pcie_type(dev); > > + if (port_type == PCI_EXP_TYPE_ROOT_PORT) { > > + pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status); > > + pci_write_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, status); > > + } > > + > > + pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, &status); > > + pci_write_config_dword(dev, pos + PCI_ERR_COR_STATUS, status); > > + > > + pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status); > > + pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status); > > + > > + return 0; > > +} > > +EXPORT_SYMBOL_GPL(pci_cleanup_aer_error_status_regs); > > Why does this need to be exported? I see that > pci_cleanup_aer_uncorrect_error_status() above is exported, and it is used > by many drivers, so I see why that needs to be exported. But there are no > users of pci_cleanup_aer_error_status_regs() outside the PCI core yet. And > if we wanted to add users outside the core, I would question whether that's > the right thing to do. AER management seems like something that probably > should be done by the PCI core, not by drivers. You are right. Sincerely, Taku Izumi > > > /** > > * add_error_device - list device to be handled > > * @e_info: pointer to error info > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > > index cefd636..d660db7 100644 > > --- a/drivers/pci/probe.c > > +++ b/drivers/pci/probe.c > > @@ -12,6 +12,7 @@ > > #include <linux/module.h> > > #include <linux/cpumask.h> > > #include <linux/pci-aspm.h> > > +#include <linux/aer.h> > > #include <asm-generic/pci-bridge.h> > > #include "pci.h" > > > > @@ -1542,6 +1543,10 @@ static void pci_init_capabilities(struct pci_dev *dev) > > > > /* Enable ACS P2P upstream forwarding */ > > pci_enable_acs(dev); > > + > > + /* Cleanup AER error status registerts */ > > + if (pci_is_pcie(dev)) > > + pci_cleanup_aer_error_status_regs(dev); > > The other capability init functions check internally for pci_is_pcie(), so > please do the same here. > > > } > > > > void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) > > diff --git a/include/linux/aer.h b/include/linux/aer.h > > index 4fef65e..744b997 100644 > > --- a/include/linux/aer.h > > +++ b/include/linux/aer.h > > @@ -42,6 +42,7 @@ struct aer_capability_regs { > > int pci_enable_pcie_error_reporting(struct pci_dev *dev); > > int pci_disable_pcie_error_reporting(struct pci_dev *dev); > > int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev); > > +int pci_cleanup_aer_error_status_regs(struct pci_dev *dev); > > #else > > static inline int pci_enable_pcie_error_reporting(struct pci_dev *dev) > > { > > @@ -55,6 +56,10 @@ static inline int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev) > > { > > return -EINVAL; > > } > > +static inline int pci_cleanup_aer_error_status_regs(struct pci_dev *dev) > > +{ > > + return -EINVAL; > > +} > > #endif > > > > void cper_print_aer(struct pci_dev *dev, int cper_severity, > > -- > > 1.8.3.1 > > -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html