On Fri, Dec 01, 2017 at 05:30:48PM +0800, Dongdong Liu wrote: > > 在 2017/12/1 7:15, Bjorn Helgaas 写道: > >On Thu, Nov 30, 2017 at 08:47:38PM +0800, Dongdong Liu wrote: > >> > >>>I do not think drivers should be calling > >>>pci_enable_pcie_error_reporting() directly. And I don't think we > >>>should depend on _HPX. _HPX is intended for *platform-specific* > >>>things, and I don't think this is platform-specific. And of course, > >>>_HPX only works on ACPI systems, and we want this to work everywhere. > >>> > >>>So I think we need a more comprehensive generic solution in Linux. I > >>>think we need several pieces: > >>> > >>> - Disable UR reporting during enumeration, similar to how we disable > >>> Master Abort reporting. > >>> > >>> - Restore original UR reporting setting after enumeration. > >>> > >>> - Clear any logged UR errors after enumeration. > >> > >>Agree. > >> > >>> > >>> - Enable UR reporting as part of configuring every device, e.g., in > >>> pci_configure_device() or pci_init_capabilities(). > >> > >>Enable UR reporting in pci_configure_device() or pci_init_capabilities() is > >>not ok. It will still report UR as it is still in enumeration. > > > >True. For bridges, it needs to be after we've enumerated children of > >the bridge. > > > >>>I don't know whether we should enable UR reporting if AER is not > >>>present; we'll have to think about that. We may also have to > >>>restructure how the AER driver works, because I think it currently > >>>comes into play *after* we call pci_init_capabilities(). There are > >>>probably some ordering issues there. > >> > >>I write a sample code as below, it can avoid report UR during > >>enumeration. What do you think about the code ? > > > >Can you do it the same place we disable and restore Master-Abort Mode, > >in pci_scan_bridge_extend()? That's done for the same reason, so it > >would be nice to keep it together. > > > >Maybe we could have a "pci_disable_enumeration_errors()" that would > >disable both Master-Abort Mode and UR reporting, and a corresponding > >"pci_restore_enumeration_errors()" that would clear any logged errors > >and restore the original settings? > > Set devctl to disable UR in pci_scan_bridge_extend() can resolve > bridge devices UR reporting issue. But for EP devices may still > reprot UR, like non-ARI EP devices. So it's better to disable UR > reporting in pci_device_add() after calling > pci_configure_device(dev) and restore in pci_bus_add_device(). As > in pci_configure_device() may parse _HPX to enable UR, disable UR > reporting should call after pci_configure_device(). Help me understand this issue with Endpoints and how ARI is involved. Is this an issue with multi-function devices, where function 0 exists and then we try to read the Vendor ID of function 1, 2, ..., until one fails to respond? And when one fails to respond, the Endpoint logs a UR error? And I guess the ARI connection is that with ARI, we use the ARI Capability to find the next function, so we should never try to read the Vendor ID of a non-existent function? pci_device_add() and pci_bus_add_device() don't have any obvious connection to enumeration, so I don't really like those places. If I'm understanding the issue correctly, I'd rather do the error disable/restore in pci_scan_slot(), where it's right next to the enumeration of additional functions. > >>diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c > >>index bc56cf1..1a46026 100644 > >>--- a/drivers/pci/bus.c > >>+++ b/drivers/pci/bus.c > >>@@ -14,6 +14,7 @@ > >> #include <linux/ioport.h> > >> #include <linux/proc_fs.h> > >> #include <linux/slab.h> > >>+#include <linux/aer.h> > >> > >> #include "pci.h" > >> > >>@@ -311,6 +312,16 @@ void __weak pcibios_bus_add_device(struct pci_dev *pdev) { } > >> void pci_bus_add_device(struct pci_dev *dev) > >> { > >> int retval; > >>+ u16 reg16; > >>+ > >>+ /* Clear PCIe Capability's Device Status */ > >>+ pcie_capability_read_word(dev, PCI_EXP_DEVSTA, ®16); > >>+ pcie_capability_write_word(dev, PCI_EXP_DEVSTA, reg16); > >>+ pci_cleanup_aer_error_status_regs(dev); > >>+ > >>+ /* Restore UR reporting */ > >>+ if (dev->devctl_ur) > >>+ pcie_capability_set_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_URRE); > >> > >> /* > >> * Can not put in pci_device_add yet because resources > >>diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > >>index 14e0ea1..7b34083 100644 > >>--- a/drivers/pci/probe.c > >>+++ b/drivers/pci/probe.c > >>@@ -2119,9 +2119,21 @@ static void pci_set_msi_domain(struct pci_dev *dev) > >> void pci_device_add(struct pci_dev *dev, struct pci_bus *bus) > >> { > >> int ret; > >>+ u16 reg16; > >> > >> pci_configure_device(dev); > >> > >>+ pcie_capability_read_word(dev, PCI_EXP_DEVCTL, ®16); > >>+ dev->devctl_ur = reg16 & PCI_EXP_DEVCTL_URRE; > >>+ if (dev->devctl_ur) { > >>+ /* > >>+ * Turn off UR reporting, later restore UR reporting > >>+ * after enumeration. > >>+ */ > >>+ reg16 = reg16 & (~PCI_EXP_DEVCTL_URRE); > >>+ pcie_capability_write_word(dev, PCI_EXP_DEVCTL, reg16); > >>+ } > >>+ > >> device_initialize(&dev->dev); > >> dev->dev.release = pci_release_dev; > >> > >>diff --git a/include/linux/pci.h b/include/linux/pci.h > >>index ae8d2b4..c2a90f1 100644 > >>--- a/include/linux/pci.h > >>+++ b/include/linux/pci.h > >>@@ -440,6 +440,7 @@ struct pci_dev { > >> char *driver_override; /* Driver name to force a match */ > >> > >> unsigned long priv_flags; /* Private flags for the pci driver */ > >>+ u16 devctl_ur; > >> };