On Thu, Aug 31, 2017 at 10:01:30AM -0600, Alex Williamson wrote: > On Thu, 31 Aug 2017 11:40:52 +0200 > Jan Glauber <jan.glauber@xxxxxxxxxxxxxxxxxx> wrote: > > > On Wed, Aug 30, 2017 at 08:40:12AM -0600, Alex Williamson wrote: > > > On Wed, 30 Aug 2017 16:24:54 +0200 > > > Jan Glauber <jglauber@xxxxxxxxxx> wrote: > > > > > > > Root ports of cn8xxx do not function after a slot reset when used with > > > > some e1000e and LSI HBA devices. Add a quirk to prevent slot reset on > > > > these root ports. > > > > > > > > Signed-off-by: Jan Glauber <jglauber@xxxxxxxxxx> > > > > --- > > > > drivers/pci/quirks.c | 16 ++++++++++++++++ > > > > 1 file changed, 16 insertions(+) > > > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > > index 85191b8..6679971 100644 > > > > --- a/drivers/pci/quirks.c > > > > +++ b/drivers/pci/quirks.c > > > > @@ -845,6 +845,22 @@ static void quirk_cavium_sriov_rnm_link(struct pci_dev *dev) > > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa018, quirk_cavium_sriov_rnm_link); > > > > #endif > > > > > > > > +/* > > > > + * Root port on some Cavium CN8xxx chips do not successfully complete > > > > + * a bus reset when used with certain types of child devices. Config > > > > + * space access to the child may quit responding. Flag all devices under > > > > + * the secondary bus as non-resettable. > > > > + */ > > > > +static void quirk_CN8xxx_secondary_bus(struct pci_dev *dev) > > > > +{ > > > > + struct pci_dev *pdev; > > > > + > > > > + dev_warn(&dev->dev, "Cavium CN8xxx quirk detected; reset for devices on secondary bus disabled\n"); > > > > + list_for_each_entry(pdev, &dev->subordinate->devices, bus_list) > > > > + pdev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET; > > > > +} > > > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_CN8xxx_secondary_bus); > > > > + > > > > /* > > > > * Some settings of MMRBC can lead to data corruption so block changes. > > > > * See AMD 8131 HyperTransport PCI-X Tunnel Revision Guide > > > > > > > > > This doesn't seem reliable, doesn't the user just need to remove and > > > reprobe the slot and the device would re-appear without this flag set? > > > > No, I tried before to disable the slot with "echo 0 > /sys/bus/pci/slots/3/power" > > but that does not work as it is not supported. > > > > I'm not familiar with the quirk types, would another one be better > > suited here (even if we don't have the problem you descibed)? > > The scenario I'm mentioning is to "echo 1 > /sys/bus/pci/devices/<some > device under the slot>/remove", then "echo <that device address> > > /sys/bus/pci/rescan". This would break the ordering implicit in using > a fixup defined for the root port. It seems like it'd make a lot more > sense to add a test on the parent bridge more similar to how the bus > reset works. It's not the subordinate devices imposing the > no-bus-reset flag, it's the bridge device and the objects and code > should support and reflect that. Thanks, Doing "echo <that device address> > /sys/bus/pci/rescan" after the remove did not work for me, but maybe the format of the device address needs to be different. Anyway, the sequence echo 1 > /sys/bus/pci/devices/<some device under the slot>/remove echo 1 > /sys/bus/pci/rescan still triggers the panic as you mentioned above. I agree that the subordinate devices are not causing the issue, still I need to make pci_slot_resetable() return false in our case. So what if we add an additional check like: diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index fdf65a6..389db4b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4389,6 +4389,9 @@ static bool pci_slot_resetable(struct pci_slot *slot) { struct pci_dev *dev; + if (slot->bus->self & PCI_DEV_FLAGS_NO_BUS_RESET) + return false; + list_for_each_entry(dev, &slot->bus->devices, bus_list) { if (!dev->slot || dev->slot != slot) continue; --Jan