On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote:
virHostdevReAttachPCIDevices() is called when we want to re-attach
a list of hostdevs back to the host, either on the shutdown path or
via a 'virsh detach-device' call. This function always count on the
existence of the device in the host to work, but this can lead to
problems. For example, a SR-IOV device can be removed via an admin
"echo 0 > /sys/bus/pci/devices/<addr>/sriov_numvfs", making the kernel
fire up and eventfd_signal() to the process, asking for the process to
release the device. The result might vary depending on the device driver
and OS/arch, but two possible outcomes are:
1) the hypervisor driver will detach the device from the VM, issuing a
delete event to Libvirt. This can be observed in QEMU;
2) the 'echo 0 > ...' will hang waiting for the device to be unplugged.
This means that the VM process failed/refused to release the hostdev back
to the host, and the hostdev will be detached during VM shutdown.
Today we don't behave well for both cases. We'll fail to remove the PCI device
reference from mgr->activePCIHostdevs and mgr->inactivePCIHostdevs because
we rely on the existence of the PCI device conf file in the sysfs. Attempting
to re-utilize the same device (assuming it is now present back in the host)
can result in an error like this:
$ ./run tools/virsh start vm1-sriov --console
error: Failed to start domain vm1-sriov
error: Requested operation is not valid: PCI device 0000:01:00.2 is in use by driver QEMU, domain vm1-sriov
For (1), a VM destroy/start cycle is needed to re-use the VF in the guest.
For (2), the effect is more nefarious, requiring a Libvirtd daemon restart
to use the VF again in any guest.
We can make it a bit better by checking, during virHostdevReAttachPCIDevices(),
if there is any missing PCI device that will be left behind in activePCIHostdevs
and inactivePCIHostdevs lists. Remove any missing device found from both lists,
unconditionally, matching the current state of the host. This change affects
the code path in (1) (processDeviceDeletedEvent into qemuDomainRemoveDevice, all
the way back to qemuHostdevReAttachPCIDevices) and also in (b) (qemuProcessStop
into qemuHostdevReAttachDomainDevices).
NB: Although this patch enables the possibility of 'outside Libvirt' SR-IOV
hotunplug of PCI devices, if the hypervisor and the PCI driver copes with it,
our goal is to mitigate what it is still considered an user oopsie. For all''
This is one of those odd cases where you use "a" before a word starting
with a vowel rather than "an". It's that way because the *sound*
starting the next word is "you" rather than "ooh".
supported purposes, the admin must remove the SR-IOV VFs from all running domains
before removing the VFs from the host.
Resolves: https://gitlab.com/libvirt/libvirt/-/issues/72
Signed-off-by: Daniel Henrique Barboza <danielhb413@xxxxxxxxx>
---
src/hypervisor/virhostdev.c | 38 +++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
index c708791eec..ed43733e71 100644
--- a/src/hypervisor/virhostdev.c
+++ b/src/hypervisor/virhostdev.c
@@ -1077,6 +1077,40 @@ virHostdevReAttachPCIDevicesImpl(virHostdevManagerPtr mgr,
}
+static void
+virHostdevDeleteMissingPCIDevices(virHostdevManagerPtr mgr,
+ virDomainHostdevDefPtr *hostdevs,
+ int nhostdevs)
+{
+ size_t i;
+
+ if (nhostdevs == 0)
+ return;
+
+ virObjectLock(mgr->activePCIHostdevs);
+ virObjectLock(mgr->inactivePCIHostdevs);
+
+ for (i = 0; i < nhostdevs; i++) {
+ virDomainHostdevDef *hostdev = hostdevs[i];
+ virDomainHostdevSubsysPCI *pcisrc = &hostdev->source.subsys.u.pci;
+ g_autoptr(virPCIDevice) pci = NULL;
+
+ if (virHostdevGetPCIHostDevice(hostdev, &pci) != -2)
+ continue;
+
+ /* The PCI device from 'hostdev' does not exist in the host
+ * anymore. Delete it from both active and inactive lists to
+ * reflect the current host state.
+ */
+ virPCIDeviceListDel(mgr->activePCIHostdevs, &pcisrc->addr);
+ virPCIDeviceListDel(mgr->inactivePCIHostdevs, &pcisrc->addr);
+ }
+
+ virObjectUnlock(mgr->activePCIHostdevs);
+ virObjectUnlock(mgr->inactivePCIHostdevs);
It makes me nervous that you're unlocking these lists in the same order
that you lock them, since that could theoretically lead to a deadlock. I
haven't even thought beyond this single function to figure out if it's
even a possibility in this case, but still would feel better if you
unlocked in the opposite order that you lock.
With that fixed:
Reviewed-by: Laine Stump <laine@xxxxxxxxxx>
+}
+
+
/* @oldStateDir:
* For upgrade purpose: see virHostdevRestoreNetConfig
*/
@@ -1102,6 +1136,10 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr,
virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs,
hostdevs, nhostdevs, oldStateDir);
+
+ /* Handle the case where PCI devices from the host went missing
+ * during the domain lifetime */
+ virHostdevDeleteMissingPCIDevices(mgr, hostdevs, nhostdevs);
}