Re: [PATCH] virhostdev: Fix PCI devices are still attatched to stub driver bug

John Ferlan <jferlan@xxxxxxxxxx> · Thu, 6 Sep 2018 18:24:56 -0400

$SUBJ:

s/attatched/attached

s/bug//

On 08/31/2018 03:34 AM, Wu Zongyong wrote:

So, first off - I believe there are two things going on in this one
patch. Even though there is "some relationship" between the issues, the
libvirtd restart is kind of a corner case, while the change to readd the
device to inactiveDev's would affect both this libvirtd restart case
*and* the normal processing. It's taken me some time to come to that
conclusion - lots of details follow, but I'm also willing to accept the
changes are "related enough" for just one patch.

> Currently, PCI devices will not be rebound to host drivers but
> attached to the stub driver when:
> 
> 1) use libvirt to start a virtual machine with PCI devices assigned,
> then stop libvirtd process and shutdown the virtual machine. Finally,
> PCI devices are still bound to the stub driver instead of host drivers
> after libvirt start again.
> 2) use libvirt to shutdown a virtual machine wtih PCI devices assigned,

s/wiht/with

> then stop libvirtd process before libvirt try to rebind PCI devices to

Be specific which API isn't being called - it really will help. Making a
reviewer figure it out in very specific/particular cases such as this
leads to "review delay". Is qemuHostdevReAttachDomainDevices called from
qemuProcessStop what you were referring to? The Stop not getting called
because the monitor channel is closed and thus the event for shutdown
won't be triggered.

> host drivers. Finally, PCI devices are still bound to the stub driver
> after libvirt start again.

Since you've done the research and so that I'm sure I'm following your
logic - it seems like you're chasing corner cases based on libvirtd
restart. In one instance the order is:

    virsh start $dom
    service libvirtd stop  (or some other means?)
    <shutdown> the guest from within the $dom

and the other instance is

    virsh start @dom
    virsh destroy $dom
    service libvirtd stop

and issues as a result of the subsequent "service libvirtd start".

In the second instance it's not clear how your timing is "just right"
such that libvirtd can be stopped before the PCI devices can be returned
to the host PCI, but it's essentially the same as the first scenario at
least with respect to the subsequent libvirtd restart and discovery that
the guest is no longer running and needing to process that.

What may also have helped is if the commit message indicated the
"normal" path taken that would allow the drivers to be rebound properly.
 Even if it's after the "---" for review backup assistance information.

In edge cases like this, it's far easier to cut out stuff from a commit
message than it is to put yourself into the frame of reference of the
patch submitter

Still what your patch does is use the virDomainHostdevOrigStatesPtr for
a domain that was shutdown, but when libvirtd restarts. I had to go a
long way back to find when the OrigStates code was added - commit
d84b36263 in particular. In that case, the order was start domain,
restart libvirtd, and destroy domain.

So the slight difference to the condition for this patch is the domain
was shutdown while libvirtd was stopped.

In any case, all of the above is "one issue"...

Leaving the rest as a "second issue"

> 
> Notice that the comment on the top of virPCIDeviceDetach as follows:
> 
> activeDevs should be a list of all PCI devices currently in use by a
> domain.inactiveDevs is a list of all PCI devices that libvirt has

  * activeDevs is a list ...
  * inactiveDevs is a list ...

> detached from the host driver + attached to the stub driver, but
> hasn't yet assigned to a domain.
> 
> It's not reasonable that libvirt filter out devices that are either not
> active or not used by the current domain and driver. For devices belong

s/belong/that belong/

> to domains that has been shutdown before libvirt start, we should put

s/has/have

s/libvirt start, we should put them/libvirtd starts, they should be placed

> them into inactiveDevs if then meet the condition that PCI devices have

I'm finding it hard to read/parse/rewrite the rest of this sentence.

> been detached from the host driver + attached to the stub driver.
> 
> Moreover, we set orignal states of PCI devices when build PCI devices

s/we set orignal/set the original/

s/build/building/

> list in virHostdevGetPCIHostDeviceList if states has been set in struct
> virDomainHostdevDefPtr.

So the "second issue" (as I see it) is that ReAttach processing wasn't
handling returning devices that had stub driver attachment (at some
point in time). This issue occurs regardless of the libvirtd restart and
domain shutdown processing order. Even though it's related, it's
different and would need its own patch.

> 
> Signed-off-by: Wu Zongyong <cordius.wu@xxxxxxxxxx>
> ---
>  src/util/virhostdev.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/src/util/virhostdev.c b/src/util/virhostdev.c
> index ca79c37..ecf95e3 100644
> --- a/src/util/virhostdev.c
> +++ b/src/util/virhostdev.c
> @@ -235,6 +235,7 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs)
>      for (i = 0; i < nhostdevs; i++) {
>          virDomainHostdevDefPtr hostdev = hostdevs[i];
>          virDomainHostdevSubsysPCIPtr pcisrc = &hostdev->source.subsys.u.pci;
> +        virDomainHostdevOrigStatesPtr origstates = &hostdev->origstates;
>          virPCIDevicePtr pci;
>  
>          if (hostdev->mode != VIR_DOMAIN_HOSTDEV_MODE_SUBSYS)
> @@ -262,6 +263,10 @@ virHostdevGetPCIHostDeviceList(virDomainHostdevDefPtr *hostdevs, int nhostdevs)
>              virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_XEN);
>          else
>              virPCIDeviceSetStubDriver(pci, VIR_PCI_STUB_DRIVER_KVM);
> +
> +        virPCIDeviceSetUnbindFromStub(pci, origstates->states.pci.unbind_from_stub);
> +        virPCIDeviceSetRemoveSlot(pci, origstates->states.pci.remove_slot);
> +        virPCIDeviceSetReprobe(pci, origstates->states.pci.reprobe);

This looks a lot like what virHostdevUpdateActivePCIDevices is doing -
in fact I think that is the libvirtd restart "bug" you're chasing -
qemuProcessReconnect will call qemuHostdevUpdateActiveDomainDevices
after the call to qemuConnectMonitor - at least w/r/t this PCI stub
interaction.

Since the domain was stopped connecting to the monitor will fail (in my
case qemuMonitorOpenUnix), thus falling into qemuProcessStop, but
without having updated (or ReAttach'ing) the PCI devices. So, in your
environment without these two changes - could you check to see what
happens if the order is changed? Does that fix the issue?

That is move the call to qemuHostdevUpdateActiveDomainDevices to right
after qemuProcessPrepareAllowReboot.

Some more "detailed" thoughts I had on this while investigating...

Both virHostdevPreparePCIDevices and virHostdevReAttachPCIDevices will
call virHostdevGetPCIHostDeviceList. However, in the case of the former,
"Step 7" is "set the original states for hostdev def".

Following call traces for virHostdevPreparePCIDevices eventually leads
back to :

  1. qemuMigrationDstPrepareAny (migration)
  2. qemuProcessStart (guest startup)
  3. qemuDomainAttachHostDevice (hotplug).

IOW: This is the "normal" running mode.

So I have a concern regarding the "order of operation" in this case
especially since step 7 has the caveat of (actual) being true before
setting the bools.

Considering the "other pah" through virHostdevReAttachPCIDevices,
callers are:

   1. libxlDomainAttachHostPCIDevice (an error: path)
   2. libxlDomainDetachHostPCIDevice (normal path)
   3. qemuHostdevReAttachPCIDevices (normal path)
   4. virHostdevReAttachDomainDevices (libxl domain close processing)

Focusing on qemuHostdevReAttachPCIDevices I find 3 callers:

   1. qemuHostdevReAttachDomainDevices (called by qemuProcessStop)
   2. qemuDomainAttachHostPCIDevice (hotplug error processing)
   3. qemuDomainRemovePCIHostDevice (hot unplug processing).

So while setting things up seems to be the right thing to do in the
ReAttach case, the normal attach case it doesn't seem so. Perhaps too
generic.

>      }
>  
>      return pcidevs;
> @@ -1008,8 +1013,19 @@ virHostdevReAttachPCIDevices(virHostdevManagerPtr mgr,
>                  continue;
>              }
>          } else {
> -            virPCIDeviceListDel(pcidevs, pci);
> -            continue;

The above was added in commit 4cdbff3d5 by Andrea and just removed
inactiveDev's from the @pcidevs since his change added both onto
@pcidevs (at least that's my reading of it).

But with your proposed change rather than deleting that from @pcidevs,
you're moving it to the inactiveDev's list *if* there's a stub.

So what happens if there isn't a stub?  From this point forward @pcidevs
would contain both active and inactive devices.

Then step 2 comes along and runs the @pcidevs "stealing" to place the
device on the inactiveDevs list.  So now wouldn't we have duplicate
devices there?

Now if the new code went before the above 2 lines, then probably no big
deal - that way we're adding back to the inactive list for these stub
drivers and we're not keeping the device in @pcidevs meaning it wouldn't
be readded to inactive afterwards.

Hopefully this makes sense.

> +            int stub = virPCIDeviceGetStubDriver(pci);
> +            if (stub > VIR_PCI_STUB_DRIVER_NONE &&
> +                stub < VIR_PCI_STUB_DRIVER_LAST) {
> +                /* The device is bound to a known stub driver: add a copy
> +                 * to the inactive list */
> +                VIR_DEBUG("Adding PCI device %s to inactive list",

s/%s/'%s'/

(both above and below)

> +                          virPCIDeviceGetName(pci));
> +                if (virPCIDeviceListAddCopy(mgr->inactivePCIHostdevs, pci) < 0) {
> +                    VIR_ERROR(_("Failed to add PCI device %s to the inactive list"),
> +                              virGetLastErrorMessage());
> +                    virResetLastError();
> +                }
> +            }

For reference, this is a copy of what's in virHostdevPreparePCIDevices
step 2. I think you need to do some extra debugging and be sure you're
not duplicating things.  IOW: Add some debug code to dump the active and
inactive lists.

Although this is really long - I hope it all makes sense.

John

>          }
>  
>          i++;
> 

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list