Sorry, apparently missed this reply previously On Wed, 16 Mar 2016 11:19:38 +0100 Andrea Bolognani <abologna@xxxxxxxxxx> wrote: > On Tue, 2016-03-15 at 13:31 -0600, Alex Williamson wrote: > > So we have all sorts of driver issues that are sure to come and go over > > time and all sorts of use cases that seem difficult to predict. If we > > know we're in a ovirt/openstack environment, managed='detach' might > > actually be a more typical use case than managed='yes'. It still > > leaves a gap that we hope the host driver doesn't do anything bad when > > it initializes the device and hope that it releases the device cleanly, > > but it's probably better than tempting fate by unnecessarily bouncing > > it back and forth between drivers. > > Is sharing a hostdev between multiple guests more solid in general? > Eg. if I have g1 and g2, both configured to use the host's GPU, can > I start up g1, shut it down, start up g2 and expect things to just > work? Hopefully that's the case because the device would go through > a more complete set up / tear down cycle. Yes, this should work. > Anyway, after reading your explanation I'm wondering if we > shouldn't always recommend a setup where devices that are going > to be assigned to guests are just never bound to any host driver, > as that sounds like it would have the most chances of working > reliably. > > IIUC, the pci-stubs.ids kernel parameter you mentioned above does > exactly that. Maybe blacklisting the host driver as well might be > a good idea? Anything else a user would need to do? Would the > user or management layer not be able to configure something like > that in a oVirt / OpenStack environment? What should we change > to make it possible or easier? Both pci-stub.ids and blacklisting have some serious problems. A common thread on the vfio-users list is that someone has two identical devices and wants to use one for the host and one for a VM and can't figure out how to make that work. The only solution I know is to escalate to initramfs scripts that are smarter about which device to pick. Maybe this implies that we need some initramfs integration for driver_override so that we can force a particular driver for a particular device, but that fails to acknowledge that a user may want the device to use the default driver, up until the point that they don't. Blacklisting has similar issues, but worse. Not all SR-IOV drivers are partitioned such that there's a separate driver for PF vs VF, quite a few use the same driver for both. So immediately blacklisting doesn't work for a large class of drivers. In general it's even more broad than pci-stub.ids, for instance I might not want snd-hda-intel to bind to the GPU audio device used for a VM, but I definitely want snd-hda-intel binding to my primary audio device. I've also found that marking i915 for blacklist does absolutely nothing... go figure. BTW, even if we do either of these, do we still need managed='yes' since the device will be bound to either pci-stub or nothing and needs to get to vfio-pci? I don't recall if libvirt accepts those cases or not. If not, then we're unnecessarily bouncing back and forth between drivers even if we take a pci-stub/blacklist approach unless we inject another layer that actually binds them to vfio-pci. Let's take it one step further, what if we made an initramfs script that would set "vfio-pci" for the driver_override for a user specified list of devices. Recall that driver_override means that only the driver with the matching name can bind to the device. So now we boot up with the user defined set of devices without drivers. What happens in either the managed='yes' or 'no' scenarios? Logically this seems like exactly when we want to use 'detach'. > That would give us a setup we can rely on, and cover all use > cases you mentioned except that of someone assigning his laptop's > GPU to a guest and needing the console to be available before the > guest has started up because no other access to the machine is > available. But in that case, even with managed='detach', the user > would need to blindly restart the machine after guest shutdown, > wouldn't he? I don't want to make this issue about any one particular driver because I certainly hope that we'll eventually fix that one driver. The problem is more that this is not an uncommon scenario. In one of my previous replies I listed all of the driver issues that I know about. In some cases we can't fix them because the driver is proprietary, in others we just don't have the bandwidth. Users have far more devices at their disposal to test than we do, so they're likely going to continue to run into these issues. Yes, managed='no' is an alternative, but if we go down the path of saying 'well that feature sounds like foo+bar and we already do both of those, so we don't need that feature', then we have to start asking why do we even have managed='yes'? Why do we have an autostart feature, we can clearly start a VM and it's the app's problem when to call that, etc. It seems useful to me, but I can understand the concern about feature bloat. Thanks, Alex -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list