Re: [PATCH RESEND 06/20] virhostdev.c: virHostdevGetPCIHostDevice() now reports missing device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sorry for the delay in answering. I usually get the replies to my
patches in my inbox because I usually get my email CC'ed in the
reply.

On 1/20/21 10:33 PM, Laine Stump wrote:
On 1/18/21 2:53 PM, Daniel Henrique Barboza wrote:
Gitlab issue #72 [1] reports that removing SR-IOVs VFs before
removing the devices from the running domains can have strange
consequences.


Resolution: "Don't do that!!" :-)


Seriously, though, there are any number of things that someone with root access to the host could do that would completely confuse libvirt and break everything. For example, assigning a PF to a guest after having assigned one or more of that PF's VFs to guests.


But I guess there's no harm in trying to mitigate the damage. As long as it doesn't complicate the code to the extent that it becomes more confusing, difficult to maintain, and liable to other regressions.

This was the idea with this series. I'm trying my best to not interfere
with working code/logic by adding these mitigations.




  QEMU might be able to hotunplug the device inside the
guest, but Libvirt will not be aware of that,


If qemu knows enough to unplug the device from the guest, then I'm presuming that it must also be sending a DEVICE_DELETED event up to libvirt. Of course libvirt isn't expecting this, and so probably throws it away, right?

---- long story warning ----

We have 2 scenarios I'm trying to deal with in this work: a SR-IOV disappearing
from the host, while a running domain was using one of its functions, in
a condition where:

1) the VF was coldplugged in the domain (i.e. it was in the domain XML
before starting the domain)
2) the VF was hotplugged in the domain


In scenario (2), QEMU will throw a DEVICE_DEL event and we'll capture it
accordingly via processDeviceDeleteEvent ->qemuDomainRemoveDevice ->
qemuDomainRemoveHostDevice. The problem is,by the time we reach
qemuDomainRemoveHostDevice, the VF is already gone from the host  and we didn't
foresee this as a possibility.

All of this is pinned on how we interpret virPCIDeviceNew(). My initial idea
for this work was to change virPCIDeviceNew() to allow the object to be created,
regardless of whether the device actually exists in the host, and then we
would have a 'present' flag to determine if the device is present or not.
I got terrified by the idea of changing the semantics of virPCIDeviceNew(),
which will return -1 if the device isn't present, for all Libvirt devices that
uses the object, just to handle a niche case. I decided then to do the
'device is present' checks you'll see around in this series.


As far as scenario (1) goes, I don't fully understand why it happens TBH. For
QEMU there is no difference - QEMU will throw a DEVICE_DEL event for a VF that
disappears from the host, even if the VF was coldplugged. When launching the VM
with Libvirt, QEMU internals aren't aware of kernel intent to remove the device.

Looking in the vfio driver in the kernel I learned that this intent is expressed by
an eventfd call of vfio_pci_request. My somewhat educated guess is that Libvirt
is wrapping up the QEMU process in a way that QEMU never sees this eventfd. The
result, as you can see in the bug, is that unless the admin kills the terminal where
"echo 0 > num_vfs" was issue, 'echo' will lock for as long as the domain using the VF
keeps running.

And when the domain is shut down the VF is removed from the process, and the host,
and then we have a similar problem in our shutdown path, via qemuProcessStop.





  and then the guest is
now inconsistent with the domain definition.

There's also the possibility of the VFs removal not succeeding
while the domain is running but then, as soon as the domain
is shutdown, all the VFs are removed. Libvirt can't handle
the removal of the PCI devices while trying to reattach the
hostdevs, and the Libvirt daemon can be left in an inconsistent
state (see [2]).

This patch starts to address the issue related in Gitlab #72, most
notably the issue described in [2]. When shutting down a domain
with SR-IOV hostdevs that got missing, virHostdevReAttachPCIDevices()
is failing the whole process and failing to reattach all the
PCI devices, including the ones that aren't related to the VFs that
went missing. Let's make it more resilient with host changes by
changing virHostdevGetPCIHostDevice() to return an exclusive error
code '-2' for this case. virHostdevGetPCIHostDeviceList() can then
tell when virHostdevGetPCIHostDevice() failed to find the PCI
device of a hostdev and continue to make the list of PCI devices.

virHostdevReAttachPCIDevices() will now be able to proceed reattaching
all other valid PCI devices, at least. The 'ghost hostdevs' will be
handled later on.

[1] https://gitlab.com/libvirt/libvirt/-/issues/72
[2] https://gitlab.com/libvirt/libvirt/-/issues/72#note_459032148

Signed-off-by: Daniel Henrique Barboza <danielhb413@xxxxxxxxx>
---
  src/hypervisor/virhostdev.c | 8 ++++++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
index bd35397f2c..dbba36193b 100644
--- a/src/hypervisor/virhostdev.c
+++ b/src/hypervisor/virhostdev.c
@@ -220,7 +220,8 @@ virHostdevManagerGetDefault(void)
   * is returned.
   *
   * Returns: 0 on success (@pci might be NULL though),
- *         -1 otherwise (with error reported).
+ *         -1 otherwise (with error reported),
+ *         -2 PCI device not found. @pci will be NULL
   */
  static int
  virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev,
@@ -235,6 +236,9 @@ virHostdevGetPCIHostDevice(const virDomainHostdevDef *hostdev,
          hostdev->source.subsys.type != VIR_DOMAIN_HOSTDEV_SUBSYS_TYPE_PCI)
          return 0;
+    if (!virPCIDeviceExists(&pcisrc->addr))
+        return -2;


What happens if the device disappeared and then came back before you got here? (i.e. by setting sriov_numvfs = 0, and then sriov_numvfs = 128 (or whatever)?

I'm not sure what would happen. I never though that this could trigger a TOCTOU
situation.

I **think** that this wouldn't be too nefarious - we would remove the device from
the domain definition and the device would be there in the host. Would need to
try to simulate this to be sure though.


Thanks,

DHB


It's too late in the day for my to process this too much, so I'm going to come back to it tomorrow. I will continue on to see if there is other "simple" stuff in subsequent patches though - even if we can't get all of this pushed right away, there's a significant amount that can be pushed.


+
      actual = virPCIDeviceNew(&pcisrc->addr);
      if (!actual)
@@ -270,7 +274,7 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs)
          virDomainHostdevDefPtr hostdev = hostdevs[i];
          g_autoptr(virPCIDevice) pci = NULL;
-        if (virHostdevGetPCIHostDevice(hostdev, &pci) < 0)
+        if (virHostdevGetPCIHostDevice(hostdev, &pci) == -1)
              return NULL;
          if (!pci)






[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux