Re: [PATCH v2] qemu: Do not reattach PCI device used by other domain when shutdown

Osier Yang <jyang@xxxxxxxxxx> · Mon, 17 Oct 2011 18:44:26 +0800




于 2011年10月17日 09:40, Osier Yang 写道:
于 2011年10月15日 02:53, Eric Blake 写道:
On 10/12/2011 10:05 PM, Osier Yang wrote:
When failing on starting a domain, it tries to reattach all the PCI
devices defined in the domain conf, regardless of whether the devices
are still used by other domain. This will cause the devices to be 
deleted
from the list qemu_driver->activePciHostdevs, thus the devices will be
thought as usable even if it's not true. And following commands
nodedev-{reattach,reset} will be successful.

How to reproduce:
   1) Define two domains with same PCI device defined in the confs.
   2) # virsh start domain1
   3) # virsh start domain2
   4) # virsh nodedev-reattach $pci_device

This time around, I actually spent time reproducing the bug scenario 
on hardware rather than just analyzing by inspection (it took me a 
while to figure out why the same machine that used to let me do pci 
passthrough was no longer working, until I realized that my hardware 
is old enough to be insecure, and I've upgraded software since my 
last passthrough test, so the CVE fix from 
https://bugzilla.redhat.com/show_bug.cgi?id=715555 has to be 
intentionally bypassed for me to get passthrough again).  With that 
testing, I proved that this patch prevents that problem.  But, as 
written, your patch caused the dreaded "An error occurred, but the 
cause is unknown".


You will see the device will be reattached to host successfully.
As pciDeviceReattach just check if the device is still used by
other domain via checking if the device is in list 
driver->activePciHostdevs,
however, the device is deleted from the list by step 2).

This patch is to prohibit the bug by:
   1) Prohibit a domain starting or device attachment right at
      preparation period (qemuPrepareHostdevPCIDevices) if the
      device is in list driver->activePciHostdevs, which means
      it's used by other domain.

   2) Introduces a new field for struct _pciDevice, (const char 
*used_by),
      it will be set as the domain name at preparation period,
      (qemuPrepareHostdevPCIDevices). Thus we can prohibit deleting
      the device from driver->activePciHostdevs if it's still used by
      other domain when stopping the domain process.


@@ -111,7 +112,7 @@ int qemuPrepareHostdevPCIDevices(struct 
qemud_driver *driver,
      if (!(pcidevs = qemuGetPciHostDeviceList(hostdevs, nhostdevs)))
          return -1;

-    /* We have to use 4 loops here. *All* devices must
+    /* We have to use 6 loops here. *All* devices must
       * be detached before we reset any of them, because
       * in some cases you have to reset the whole PCI,
       * which impacts all devices on it. Also, all devices

I added further loop labels.

+        /* The device is in use by other active domain if
+         * the dev is in list driver->activePciHostdevs.
+         */
+        if (!pciDeviceIsAssignable(dev, !driver->relaxedACS) ||
+            pciDeviceListFind(driver->activePciHostdevs, dev))
+            goto cleanup;
+    }

Here's where the bad message comes in - you jump to cleanup without 
ever emitting an error message.  Not to mention now that we have the 
new used_by field, the message could actually be informative :)  With 
my changes below, I get the much nicer:

Error starting domain: Requested operation is not valid: PCI device 
0000:0a:0a.0 is in use by domain dom

Oh, right, thanks for the fixing, :)


@@ -277,6 +298,18 @@ void qemuDomainReAttachHostdevDevices(struct 
qemud_driver *driver,

      for (i = 0; i<  pciDeviceListCount(pcidevs); i++) {
          pciDevice *dev = pciDeviceListGet(pcidevs, i);
+        pciDevice *activeDev = NULL;
+        const char *used_by = NULL;
+
+        /* Never delete the dev from list driver->activePciHostdevs
+         *  if it's used by other domain.
+         */
+        activeDev = pciDeviceListFind(driver->activePciHostdevs, dev);
+        if (activeDev&&
+            (used_by = pciDeviceGetUsedBy(activeDev))&&
+            STRNEQ(name, used_by))
+            continue;


used_by is kept as it might be NULL.


In that case, use STRNEQ_NULLABLE.  But you are right - across 
libvirtd restarts, we lose track of which domain owns the device - so 
I did indeed reproduce a case of NULL.  Perhaps food for a later 
patch (when reloading domains, cycle through all hostdevs in use by 
active domains to repopulate the used_by fields).  But not a 
show-stopper to getting this bug fix in now.

Yes, I didn't realize this, we need further patch.


After thinking a while, even we set the used_by during domains reloading,
the other attrs (dev->unbind_from_stub, dev->reprobe, dev->remove_slot)
of the PCI dev still will be lost. We can't known what values should be
for these attrs as generally they are initialized when do preparation,
however the preparations are already passed, and the devices are using
by the running domains. And thus the device won't be bound to the
original driver (if it had) correctly, or mistakenly bound to some driver
but it didn't have one.

Perhaps we need some new XMLs introduced for hostdevs.

Osier

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list