[+cc Don] On Fri, Aug 9, 2013 at 5:44 PM, Yinghai Lu <yinghai@xxxxxxxxxx> wrote: > On Fri, Aug 9, 2013 at 10:04 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: >> What would happen if we just reverted dc087f2f6a2 for now? Jiang? I tried to reproduce the problem you reported ("Cannot add device at 0000:02:00"), but I haven't been able to. I have an ixgbe SR-IOV device in an external hotplug-capable chassis, and I'm using v3.11-rc4 with a patch to add a little instrumentation. Here's what I see: # echo -n 8 > /sys/bus/pci/devices/0000:08:00.0/sriov_numvfs ixgbe 0000:08:00.0 eth1: SR-IOV enabled with 8 VFs pci 0000:08:10.0: [8086:1515] type 00 class 0x020000 # VF1 pci 0000:08:10.2: [8086:1515] type 00 class 0x020000 # VF2 ... pci 0000:08:11.6: [8086:1515] type 00 class 0x020000 # VF8 Created 8 VFs; no problem there. # echo -n 0 > /sys/bus/pci/slots/4-1/power pciehp 0000:04:03.0:pcie24: disable_slot: physical_slot = 4-1 pciehp 0000:04:03.0:pcie24: pciehp_unconfigure_device: domain:bus:dev = 0000:08:00 ixgbevf 0000:08:11.6: pci_stop_and_remove_bus_device pci 0000:08:11.6: pci_release_dev # release VF8 ixgbevf 0000:08:11.4: pci_stop_and_remove_bus_device pci 0000:08:11.4: pci_release_dev # release VF7 ... ixgbevf 0000:08:10.0: pci_stop_and_remove_bus_device pci 0000:08:10.0: pci_release_dev # release VF1 ixgbe 0000:08:00.1: pci_stop_and_remove_bus_device pci 0000:08:00.1: pci_release_dev # release PF1 ixgbe 0000:08:00.0: pci_stop_and_remove_bus_device # stop/remove PF0 # PF0 not released When I turn off power to the slot, pciehp stops and removes all the devices, starting with the VFs. The PF 08:00.0 is stopped but not released, because we didn't call virtfn_remove(), so the reference held by the VFs was never released. This is a problem. # echo -n 1 > /sys/bus/pci/slots/4-1/power pci 0000:08:00.0: [8086:1528] type 00 class 0x020000 pci 0000:08:00.1: [8086:1528] type 00 class 0x020000 When I turn the slot back on, we find both PFs; no apparent problem there. On v3.10 (which does not contain dc087f2f6a), I see essentially the same thing. The PFs are stopped before the VFs because v3.10 doesn't contain 29ed1f29b68, which removes devices in reverse order. But even in v3.10, PF0 is stopped but not released because the VFs still hold references to it. And there's no complaint when turning the slot back on. So we definitely have a problem: when we remove the PF, we don't release it correctly because we don't release the references held by VFs. But this is not a regression, and doesn't seem urgent enough to rush something for v3.11. I think we're headed toward tearing down all VFs when removing a PF. Obviously, we *have* to do that when powering off the device, and I think we also need to do it when removing the PF driver just to keep all the bookkeeping sane. I don't know how much heartburn this will cause Don. So tell me if you still think we need to do something for v3.11. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html