[+cc Alex, Don] On Tue, May 27, 2014 at 5:53 PM, Alexander Duyck <alexander.h.duyck@xxxxxxxxx> wrote: > On 05/27/2014 03:22 PM, Bjorn Helgaas wrote: >> On Mon, May 05, 2014 at 02:25:17PM -0700, Alexander Duyck wrote: >>> This fixes an issue I found in which triggering a reset via the PCI sysfs >>> reset while SR-IOV was enabled would leave the VFs in a state in which the >>> BME and MSI-X enable bits were all cleared. >>> >>> To correct that I have added code so that the VF state is saved and restored >>> as a part of the PF save and restore state functions. By doing this the VF >>> state is restored as well as the IOV state allowing the VFs to resume function >>> following a reset. >>> >>> Signed-off-by: Alexander Duyck <alexander.h.duyck@xxxxxxxxx> >>> --- >>> drivers/pci/iov.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++-- >>> drivers/pci/pci.c | 2 ++ >>> drivers/pci/pci.h | 5 +++++ >>> 3 files changed, 53 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c >>> index de7a747..645ed71 100644 >>> --- a/drivers/pci/iov.c >>> +++ b/drivers/pci/iov.c >>> @@ -521,13 +521,57 @@ resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno) >>> } >>> >>> /** >>> + * pci_save_iov_state - Save the state of the VF configurations >>> + * @dev: the PCI device >>> + */ >>> +int pci_save_iov_state(struct pci_dev *dev) >>> +{ >>> + struct pci_dev *vfdev = NULL; >>> + unsigned short dev_id; >>> + >>> + /* only search if we are a PF */ >>> + if (!dev->is_physfn) >>> + return 0; >>> + >>> + /* retrieve VF device ID */ >>> + pci_read_config_word(dev, dev->sriov->pos + PCI_SRIOV_VF_DID, &dev_id); ... >>> + /* loop through all the VFs and save their state information */ >>> + while ((vfdev = pci_get_device(dev->vendor, dev_id, vfdev))) { >>> + if (vfdev->is_virtfn && (vfdev->physfn == dev)) { >>> + int err = pci_save_state(vfdev); >> >> It makes me uneasy to operate on another device (we're resetting A, and >> here we save state for B). I know B is dependent on A, since B is a VF >> related to PF A, but what synchronization is there to serialize this >> against any other save/restore operations that may be in progress by B's >> driver or by a sysfs operation on B? > > I don't believe there is any synchronization mechanism in place > currently. I can look into that as well. Odds are we probably need to > have the VFs check the parent lock before they take any independent action. It's just the whole question of how we manage the single "saved-state" area. Right now, I think almost all use of it is under control of the driver that owns the device, in suspend/resume methods. The exceptions are the PM suspend/freeze/etc. routines in pci/pci-driver.c, which I assume prevent the driver from running and are therefore safe, and the reset path. I don't know how the >> Is there anything in the reset path that pays attention to whether >> resetting this PF will clobber VFs? Do we care whether those VFs are in >> use? I assume they might be in use by guests? > > The problem I found was that the sysfs reset call doesn't bother to > check with the PF driver at all. It just clobbers the PF and any VFs on > it without talking to the PF driver. There is Keith Busch's recent patch: http://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/hotplug&id=3ebe7f9f7e4a4fd1f6461ecd01ff2961317a483a . I dunno if that's useful to you or not. And I'm not sure there's actually a requirement to *have* a PF driver. Obviously there has to be a way to enable the VFs, but once they're enabled, it might be possible to keep using them via VF drivers even without a PF driver in the picture. Maybe resetting the PF should just fail if there's an active VF. If you need to reset the PF, you'd have to unbind the VFs first. >>> + if (err) >>> + return err; >>> + } >>> + } >> >> pci_get_device() acquires a reference on each device it returns, so this >> strategy would require a pci_dev_put(). > > Yes, if I am not mistaken the pci_dev_put is called as a part of > pci_get_dev_by_id which is what pci_get_device ends up being. Oh, yeah, you're right. I forgot about that. Since you call it in a loop until you get NULL back, you're OK. It's only when you stop before you get NULL that you have to deal with the extra reference. >> But I'm not really keen on pci_get_device() in the first place. It works >> by iterating over all PCI devices in the system, which seems like a >> sledgehammer approach. It *is* widely used, but mostly in quirk-type code >> from which I avert my eyes. >> >> Maybe you could do something based on pci_walk_bus()? If you did that, I >> think the PCI_SRIOV_VF_DID would become superfluous. >> > > I can look into that, I'm not familiar with the interface. I'll have to > see what the relationship is between the PF and VF in terms of busses as > I don't recall it off of the top of my head. This reminds me about an open problem: VFs can be on "virtual" buses, which aren't really connected in the hierarchy, and I don't think we have a nice way to iterate over them. So probably pci_get_device() is the best we can do now. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html