Re: Add options to device xml to skip reattach of pci passthrough devices.

Alex Williamson <alex.williamson@xxxxxxxxxx> · Fri, 18 Jun 2021 08:58:08 -0600

On Fri, 18 Jun 2021 10:43:07 -0400
Laine Stump <laine@xxxxxxxxxx> wrote:

> On 6/16/21 4:15 PM, Daniel Henrique Barboza wrote:
> > 
> > 
> > On 6/9/21 4:38 PM, Manish Mishra wrote:  
> >> Hi Everyone,
> >>
> >> We want to add extra options to device xml to skip reattach of pci 
> >> passthrough devices. Following is xml format for pci passthrough 
> >> devices added to domain as of now.
> >>
> >> <hostdev mode='subsystem' type='pci' managed='yes'>
> >>
> >>    <source>
> >>
> >>        <address domain='0x0000' bus='0x00' slot='0x1a' function='0x7'/>
> >>
> >>    </source>
> >>
> >> </hostdev>
> >>
> >> When we pass managed=’yes’ flag through xml, libvirt takes 
> >> responsibility of detaching device on domain(guest VM) start and 
> >> reattaching on domain shutdown. We observed some issues where guest VM 
> >> shutdown may take long time, blocked for reattach operation on pci 
> >> passthrough device. As domain lock is held during this time it also 
> >> makes libvirt mostly inactive as it blocks even basic operations like 
> >> (virsh list). Reattaching of device to host can block due to reasons 
> >> like buggy driver or initialization of device itself can take long 
> >> time in some cases.  
> > 
> > I am more interested in hearing about the problem with this faulty buggy
> > driver holding domain lock during device reattach and compromising 'virsh'
> > operations, and see if there's something to do to mitigate that, instead
> > of creating a XML workaround for a driver problem.
> >   
> >>
> >> We want to pass following extra options to resolve this:
> >>
> >>  1. *skipReAttach*(optional flag)
> >>
> >> In some cases we do not need to reattach device to host as it may be 
> >> reserved only for guests, with this flag we can skip reattach 
> >> operation on host.  We do not want to modify managed flag to avoid 
> >> regression, so thinking of adding new optional flag.
> >>
> >>  2. *reAttachDriverName*(optional flag)
> >>
> >> Name of driver to which we want to attach instead of default, to avoid 
> >> reattaching to buggy driver. Currently libvirt asks host to auto 
> >> selects driver for device.
> >>
> >> Yes we can use managed=’no’ but in that case user has to take 
> >> responsibility of detaching device before starting domain which we do 
> >> not want. Please let us know your views on this.  
> > 
> > The case you mentioned above, "we do not need to reattach device to host
> > as it may be reserved only for guests", is one of the most common uses
> > we have for managed='no' AFAIK. The user/sysadm must detach the device
> > from the host, but it's only one time. After that the device can remain
> > detached from the host, and guests can use it freely as long as you
> > don't reboot the host (or reattach the device back). This scenario
> > you described fit the managed='no' mechanics fine IMO.
> > 
> > If you want to automate the detach process, you can use a Libvirt QEMU
> > hook (/etc/libvirt/hooks/qemu) to make the device detach when starting
> > the domain, in case the device isn't already detached. Note that
> > this has the same effect of the "skipReAttach" option you proposed.
> > 
> > Making a design around faulty drivers isn't ideal. If the driver you're
> > using starts to have problems with the detach operation as well, 
> > 'skipReAttach'
> > will do you no good. You'll have to fall back to 'managed=no' to circumvent
> > that.
> > 
> > Even if we discard the motivation, I'm not sure about the utility of having
> > more forms of PCI assignment management (e.g 
> > managed=yes|no|detach|reattach).
> > managed=yes|no seems to cover most use cases where the device driver works
> > properly.
> > 
> > 
> > Laine, what do you think?  
> 
> 
> I have a vague memory of someone (may even have been me) proposing 
> something similar several years ago, and the idea was shot down. I don't 
> remember the exact context or naming, but the general idea was to have 
> something like managed='detach-only' in order to have the advantage of 
> all configuration being within libvirt, but eliminating the potential 
> bad behavior associated with repeated re-binding of devices to drivers. 
> I unfortunately also don't recall the reason the idea was nixed. Dan or 
> Alex - do either of you have any memory of this?

Sounds very vaguely familiar, but not enough to remember the arguments.

> As for myself, 1) I agree with Daniel's suggestion that it is important 
> to find the root cause of the long delay rather than just covering it up 
> with another obscure option that will need to be re-discovered by anyone 
> who encounters the problem, and 2) every new bit that we add in makes 
> the code more complex and so more prone to errors, and also makes 
> configuration more complex and also more prone to errors. So while new 
> options like this could be useful, they could also be a net loss (or 
> not, it's hard to know without actually doing it, but once it's done it 
> can't be "un-done" :-))

AFAIK, the generally recommendation for "devices used only for
assignment" is to use driverctl to automatically attach these devices
to vfio-pci on boot/attach and managed="no" in the xml.  It might not
be a bad idea if libvirt were to fork a thread for rebinding a device
to the host, but xml workarounds don't seem well justified here.
Thanks,

Alex