Gitlab issue #72 [1] reports that removing SR-IOVs VFs before removing the devices from the running domains can have strange consequences. QEMU might be able to hotunplug the device inside the guest, but Libvirt will not be aware of that, and then the guest is now inconsistent with the domain definition. There's also the possibility of the VFs removal not succeeding while the domain is running but then, as soon as the domain is shutdown, all the VFs are removed. Libvirt can't handle the removal of the PCI devices while trying to reattach the hostdevs, and the Libvirt daemon can be left in an inconsistent state (see [2]). This patch starts to address the issue related in Gitlab #72, most notably the issue described in [2]. When shutting down a domain with SR-IOV hostdevs that got missing, virHostdevReAttachPCIDevices() is failing the whole process and failing to reattach all the PCI devices, including the ones that aren't related to the VFs that went missing. Let's make it more resilient with host changes by changing virHostdevGetPCIHostDevice() to return an exclusive error code '-2' for this case. virHostdevGetPCIHostDeviceList() can then tell when virHostdevGetPCIHostDevice() failed to find the PCI device of a hostdev and continue to make the list of PCI devices. virHostdevReAttachPCIDevices() will now be able to proceed reattaching all other valid PCI devices, at least. The 'ghost hostdevs' will be handled later on. [1] https://gitlab.com/libvirt/libvirt/-/issues/72 [2] https://gitlab.com/libvirt/libvirt/-/issues/72#note_459032148 Signed-off-by: Daniel Henrique Barboza <danielhb413@xxxxxxxxx> --- src/hypervisor/virhostdev.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c index bd35397f2c..dbba36193b 100644 --- a/src/hypervisor/virhostdev.c +++ b/src/hypervisor/virhostdev.c @@ -220,7 +220,8 @@ virHostdevManagerGetDefault(void) * is returned. * * Returns: 0 on success (@pci might be NULL though), - * -1 otherwise (with error reported). + * -1 otherwise (with error reported), + * -2 PCI device not found. @pci will be NULL */ static int virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev, @@ -235,6 +236,9 @@ virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev, hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI) return 0; + if (!virPCIDeviceExists(&pcisrc->addr)) + return -2; + actual = virPCIDeviceNew(&pcisrc->addr); if (!actual) @@ -270,7 +274,7 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs) virDomainHostdevDefPtr hostdev = hostdevs[i]; g_autoptr(virPCIDevice) pci = NULL; - if (virHostdevGetPCIHostDevice(hostdev, &pci) < 0) + if (virHostdevGetPCIHostDevice(hostdev, &pci) == -1) return NULL; if (!pci) -- 2.26.2