On 06/22/2017 11:28 AM, Alex Williamson wrote: > On Thu, 22 Jun 2017 17:14:48 +0200 > Erik Skultety <eskultet@xxxxxxxxxx> wrote: > >> [...] >>>> >>>> ^this is the thing we constantly keep discussing as everyone has a slightly >>>> different angle of view - libvirt does not implement any kind of policy, >>>> therefore the only "configuration" would be the PCI parent placement - you say >>>> what to do and we do it, no logic in it, that's it. Now, I don't understand >>>> taking care of the guesswork for the user in the simplest manner possible as >>>> policy rather as a mere convenience, be it just for developers and testers, but >>>> even that might apparently be perceived as a policy and therefore unacceptable. >>>> >>>> I still stand by idea of having auto-creation as unfortunately, I sort of still >>>> fail to understand what the negative implications of having it are - is that it >>>> would get just unnecessarily too complex to maintain in the future that we would >>>> regret it or that we'd get a huge amount of follow-up requests for extending the >>>> feature or is it just that simply the interpretation of auto-create == policy? >>> >>> The increasing complexity of the qemu driver is a significant concern with >>> adding policy based logic to the code. THinking about this though, if we >>> provide the inactive node device feature, then we can avoid essentially >>> all new code and complexity QEMU driver, and still support auto-create. >>> >>> ie, in the domain XML we just continue to have the exact same XML that >>> we already have today for mdevs, but with a single new attribute >>> autocreate=yes|no >>> >>> <devices> >>> <hostdev mode='subsystem' type='mdev' model='vfio-pci' autocreate="yes"> >>> <source> >>> <address uuid='c2177883-f1bb-47f0-914d-32a22e3a8804'> >> >> So, just for clarification of the concept, the device with ^this UUID will have >> had to be defined by the nodedev API by the time we start to edit the domain >> XML in this manner in which case the only thing the autocreate=yes would do is >> to actually create the mdev according to the nodedev config, right? Continuing >> with that thought, if UUID doesn't refer to any of the inactive configs it will >> be an error I suppose? What about the fact that only one vgpu type can live on >> the GPU? even if you can successfully identify a device using the UUID in this >> way, you'll still face the problem, that other types might be currently >> occupying the GPU and need to be torn down first, will this be automated as >> well in what you suggest? I assume not. >> >>> </source> >>> </hostdev> >>> </devices> >>> >>> In the QEMU driver, then the only change required is >>> >>> if (def->autocreate) >>> virNodeDeviceCreate(dev) >> >> Aha, so if a device gets torn down on shutdown, we won't face the problem with >> some other devices being active, all of them will have to be in the inactive >> state because they got torn down during the last shutdown - that would work. > > > I'm not familiar with how inactive devices would be defined in the > nodedev API, would someone mind explaining or providing an example > please? I don't understand where the metadata is stored that describes > the what and where of a given UUID. Thanks, You don't understand it because it doesn't exist yet :-) The idea is essentially the same that we've talked about, except that all the information about parent PCI address, desired type of child, and anything else (is there anything else?) is stored in some not-yet-specified persistent node device config rather than directly in the domain XML. Maybe something like: <nodedevice> <uuid>BobLobLaw</uuid> <parent> <address type='pci' .... /> </parent> <child type='MoreBlah'/> </nodedevice> I haven't thought about how it would show the difference between active and inactive - didn't get enough coffee today and I have a headache. The advantage of this is that it uncouples the specifics of the child device from the domain XML - the only thing in the domain XML is the uuid. So a device config with that uuid would need to exist on every host where you wanted to run a particular guest, but the details could be different, yet you wouldn't need to edit the domain XML. This is a similar concept to the idea of creating libvirt networks that are just an indirect pointer to a bridge device (which may have a different name on each host) or to an SRIOV PF (yeah, I know Dan doesn't like that feature, but I find it very useful, and unobtrusive if management chooses not to use it). So from your point of view (I'm talking to Alex here), implementing it this way would mean that you would need to create the child device definitions in the nodedev driver once (and possibly/hopefully the uuid of the devices would be autogenerated, same as we do for uuids in other parts of libvirt config), then copy that uuid to the domain config one time. But after doing that once, you would be able to start and stop domains and the host without any extra action. You could also define different nodedevices that used the same parent for different child types, and reference them from different domain definitions, as long as you never tried to start more than one of them at a time (I'm thinking about Nvidia mdevs here, where you can only have one child type active on a particular parent at any time - if you did try to do this, libvirt would of course log an error and refuse to start the domain) I like this idea. I think it gives both you and I what we want for small/dev/testing purposes, and may also be of use to larger management applications, but it won't get in anyone's way if they don't need/want/like it. The only downsides are: 1) It will take more effort to implement, since the nodedev driver doesn't yet understand the concept of persistent config. (But doing it is a *very good* thing, so it's worthwhile.) 2) it makes it pointless for me to finally hit send on the response to this thread that I started typing all the way last Saturday, but haven't sent because, as usual, I changed my mind 4 or 5 times in the interim based on various discussions and "shower thoughts" :-P ... okay, another "shower thought" is coming in... One deficiency of this comes to mind - since the domain config references the device by uuid, and an existing child device's uuid can't be changed, the unique uuid used by a particular domain must be defined on all of the hosts that the domain might be moved to. And since other domains can't share that uuid (unless you're 100% sure they'll never be active at the same time), you won't be able to implement the alternate idea of "pre-create all the devices, then assign them to domains as needed"; instead, you'll be forced to use the "create-on-demand" model. For pre-created devices to work, you really need an extra layer of indirection - a named pool of devices, and domain config that references the pool name rather than the uuid of a specific device. Maybe this can be a later addition (or alternately we require management to modify the domain config each time the domain is started, and keep track themselves of which devices are currently in use. That seems a bit haphazard, especially if you consider the possibility of multiple management applications on one host) -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list