Re: PCI device not properly reset after VFIO

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/18/2012 04:40 PM, Alex Williamson wrote:
Hi Hannes,

Thanks for testing vfio

On Thu, 2012-10-18 at 08:47 +0200, Hannes Reinecke wrote:
Hi Alex,

I've been playing around with VFIO and megasas (of course).
What I did now was switching between VFIO and 'normal' operation, ie
emulated access.

megasas is happily running under VFIO, but when I do an emergency
stop like killing the Qemu session the PCI device is not properly reset.
IE when I load 'megaraid_sas' after unbinding the vfio_pci module
the driver cannot initialize the card and waits forever for the
firmware state to change.

I need to do a proper pci reset via
echo 1 > /sys/bus/pci/device/XXXX/reset
to get it into a working state again.

Looking at vfio_pci_disable() pci reset is called before the config
state and BARs are restored.
Seeing that vfio_pci_enable() calls pci reset right at the start,
too, before modifying anything I do wonder whether the pci reset is
at the correct location for disable.

I would have expected to call pci reset in vfio_pci_disable()
_after_ we have restored the configuration, to ensure a sane state
after reset.
And, as experience show, we do need to call it there.

So what is the rationale for the pci reset?
Can we move it to the end of vfio_pci_disable() or do we need to
call pci reset twice?

I believe the rationale was that by resetting the device before we
restore the state we stop anything that the device was doing.  Restoring
the saved state on a running device seems like it could cause problems,
so you may be right and we actually need to do reset, load, restore,
reset.  Does adding another call to pci_reset_function in the
pci_restore_state (as below) solve the problem?  Traditional KVM device
assignment has a nearly identical path, does it have this same bug?

It's actually the first time I've been able to test this (the hardware is a bit tricky to setup ...), so I cannot tell (yet)
if KVM exhibited the same thing.

Thanks,

Alex

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 6c11994..d07a45c 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -107,9 +107,10 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
         pci_reset_function(vdev->pdev);

         if (pci_load_and_free_saved_state(vdev->pdev,
-                                         &vdev->pci_saved_state) == 0)
+                                         &vdev->pci_saved_state) == 0) {
                 pci_restore_state(vdev->pdev);
-       else
+               pci_reset_function(vdev->pdev);
+       } else
                 pr_info("%s: Couldn't reload %s saved state\n",
                         __func__, dev_name(&vdev->pdev->dev));



I would have called reset after unmapping the BARs; the HBA I'm working with does need to access the BARs, so the content of them might be relevant, too.

But then I'm not really a PCI expert.
Maybe we should ask Tony Luck or Bjorn Helgaas.

Cheers,

Hannes
--
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization



[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux