virHostdevReAttachPCIDevices() is called when we want to re-attach a list of hostdevs back to the host, either on the shutdown path or via a 'virsh detach-device' call. This function always count on the existence of the device in the host to work, but this can lead to problems. For example, a SR-IOV device can be removed via an admin "echo 0 > /sys/bus/pci/devices/<addr>/sriov_numvfs", making the kernel fire up and eventfd_signal() to the process, asking for the process to release the device. The result might vary depending on the device driver and OS/arch, but two possible outcomes are: 1) the hypervisor driver will detach the device from the VM, issuing a delete event to Libvirt. This can be observed in QEMU; 2) the 'echo 0 > ...' will hang waiting for the device to be unplugged. This means that the VM process failed/refused to release the hostdev back to the host, and the hostdev will be detached during VM shutdown. Today we don't behave well for both cases. We'll fail to remove the PCI device reference from mgr->activePCIHostdevs and mgr->inactivePCIHostdevs because we rely on the existence of the PCI device conf file in the sysfs. Attempting to re-utilize the same device (assuming it is now present back in the host) can result in an error like this: $ ./run tools/virsh start vm1-sriov --console error: Failed to start domain vm1-sriov error: Requested operation is not valid: PCI device 0000:01:00.2 is in use by driver QEMU, domain vm1-sriov For (1), a VM destroy/start cycle is needed to re-use the VF in the guest. For (2), the effect is more nefarious, requiring a Libvirtd daemon restart to use the VF again in any guest. We can make it a bit better by checking, during virHostdevReAttachPCIDevices(), if there is any missing PCI device that will be left behind in activePCIHostdevs and inactivePCIHostdevs lists. Remove any missing device found from both lists, unconditionally, matching the current state of the host. This change affects the code path in (1) (processDeviceDeletedEvent into qemuDomainRemoveDevice, all the way back to qemuHostdevReAttachPCIDevices) and also in (b) (qemuProcessStop into qemuHostdevReAttachDomainDevices). NB: Although this patch enables the possibility of 'outside Libvirt' SR-IOV hotunplug of PCI devices, if the hypervisor and the PCI driver copes with it, our goal is to mitigate what it is still considered an user oopsie. For all supported purposes, the admin must remove the SR-IOV VFs from all running domains before removing the VFs from the host. Resolves: https://gitlab.com/libvirt/libvirt/-/issues/72 Signed-off-by: Daniel Henrique Barboza <danielhb413@xxxxxxxxx> --- src/hypervisor/virhostdev.c | 38 +++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index c708791eec..ed43733e71 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -1077,6 +1077,40 @@ virHostdevReAttachPCIDevicesImpl(virHostdevManagerPtr mgr, } +static void +virHostdevDeleteMissingPCIDevices(virHostdevManagerPtr mgr, + virDomainHostdevDefPtr *hostdevs, + int nhostdevs) +{ + size_t i; + + if (nhostdevs == 0) + return; + + virObjectLock(mgr->activePCIHostdevs); + virObjectLock(mgr->inactivePCIHostdevs); + + for (i = 0; i < nhostdevs; i++) { + virDomainHostdevDef *hostdev = hostdevs[i]; + virDomainHostdevSubsysPCI *pcisrc = &hostdev->source.subsys.u.pci; + g_autoptr(virPCIDevice) pci = NULL; + + if (virHostdevGetPCIHostDevice(hostdev, &pci) != -2) + continue; + + /* The PCI device from 'hostdev' does not exist in the host + * anymore. Delete it from both active and inactive lists to + * reflect the current host state. + */ + virPCIDeviceListDel(mgr->activePCIHostdevs, &pcisrc->addr); + virPCIDeviceListDel(mgr->inactivePCIHostdevs, &pcisrc->addr); + } + + virObjectUnlock(mgr->activePCIHostdevs); + virObjectUnlock(mgr->inactivePCIHostdevs); +} + + /* @oldStateDir: * For upgrade purpose: see virHostdevRestoreNetConfig */ @@ -1102,6 +1136,10 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr, virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs, hostdevs, nhostdevs, oldStateDir); + + /* Handle the case where PCI devices from the host went missing + * during the domain lifetime */ + virHostdevDeleteMissingPCIDevices(mgr, hostdevs, nhostdevs); } -- 2.26.2