On Mon, Nov 25, 2019 at 06:14:33PM +0100, Cornelia Huck wrote: > On Mon, 18 Nov 2019 19:00:25 +0000 > Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote: > > > On Mon, Nov 18, 2019 at 10:06:34AM -0700, Alex Williamson wrote: > > > Hey folks, > > > > > > We had some discussions at KVM Forum around mdev live migration and > > > what that might mean for libvirt handling of mdev devices and > > > potential libvirt/mdevctl[1] flows. I believe the current situation is > > > that libvirt knows nothing about an mdev beyond the UUID in the XML. > > > It expects the mdev to exist on the system prior to starting the VM. > > > The intention is for mdevctl to step in here by providing persistence > > > for mdev devices such that these pre-defined mdevs are potentially not > > > just ephemeral, for example, we can tag specific mdevs for automatic > > > startup on each boot. > > > > > > It seems the next step in this journey is to figure out if libvirt can > > > interact with mdevctl to "manage" a device. I believe we've avoided > > > defining managed='yes' behavior for mdev hostdevs up to this point > > > because creating an mdev device involves policy decisions. For > > > example, which parent device hosts the mdev, are there optimal NUMA > > > considerations, are there performance versus power considerations, what > > > is the nature of the mdev, etc. mdevctl doesn't necessarily want to > > > make placement decisions either, but it does understand how to create > > > and remove an mdev, what it's type is, associate it to a fixed > > > parent, apply attributes, etc. So would it be reasonable that for a > > > manage='yes' mdev hostdev device, libvirt might attempt to use mdevctl > > > to start an mdev by UUID and stop it when the VM is shutdown? This > > > assumes the mdev referenced by the UUID is already defined and known to > > > mdevct. I'd expect semantics much like managed='yes' around vfio-pci > > > binding, ex. start/stop if it doesn't exist, leave it alone if it > > > already exists. > > > > > > If that much seems reasonable, and someone is willing to invest some > > > development time to support it, what are then the next steps to enable > > > migration? > > > > The first step is to deal with our virNodeDevice APIs. > > > > Currently we have > > > > - Listing devices via ( virConnectListAllNodeDevices ) > > - Create transient device ( virNodeDeviceCreateXML ) > > - Delete transient device ( virNodeDeviceDestroy ) > > > > The create/delete APIs only deal with NPIV HBAs right now, so we need > > to extend that to deal with mdevs as first step. > > I assume the listing function already deals with all device types > supported by libvirt? Yes, that's correct. > > > So assuming we now have a VM with a managed='yes' mdev hostdev device, > > > what do we need to do to reproduce that device at the migration target? > > > mdevctl can dump a device in a json format, where libvirt could use > > > this to define and start an equivalent device on the migration target > > > (potentially this json is extended by mdevctl to include the migration > > > compatibility vendor string). Part of our discussion at the Forum was > > > around the extent to which libvirt would want to consider this json > > > opaque. For instance, while libvirt doesn't currently support localhost > > > migration, libvirt might want to use an alternate UUID for the mdev > > > device on the migration target so as not to introduce additional > > > barriers to such migrations. Potentially mdevctl could accept the json > > > from the source system as a template and allow parameters such as UUID > > > to be overwritten by commandline options. This might allow libvirt to > > > consider the json as opaque. > > > > We definifely cannot expose the JSON anywhere in libvirt public API. > > The JSON is a tool specific format, and one of libvirt's core jobs is > > to define a format that isolates apps from the specific tool's impl, > > so that we can swap out backend impls without impacting apps. > > > > > > > > An issue here though is that the json will also include the parent > > > device, which we obviously cannot assume is the same (particularly the > > > bus address) on the migration target. We can allow commandline > > > overrides for the parent just as we do above for the UUID when defining > > > the mdev device from json, but it's an open issue who is going to be > > > smart enough (perhaps dumb enough) to claim this responsibility. It > > > would be interesting to understand how libvirt handles other host > > > specific information during migration, for instance if node or processor > > > affinities are part of the VM XML, how is that translated to the > > > target? I could imagine that we could introduce a simple "first > > > available" placement in mdevctl, but maybe there should minimally be a > > > node allocation preference with optional enforcement (similar to > > > numactl), or maybe something above libvirt needs to take this > > > responsibility to prepare the target before we get ourselves into > > > trouble. > > > > I don't think we need to solve placement in libvirt. > > > > The guest XML will just reference the mdev via a UUID that > > was used with virNodeDeviceDefineXML. > > > > The virNodeDeviceDefineXML call where the mdev is first defined > > will set the details of the mdev creation for this specific host. > > The XML used with virNodeDeviceDefineXML can be different on the > > source + target hosts. As long as the UUID is the same in both > > hosts, the VM will associate with it correctly. > > I wonder how to sync up with different placements, but maybe I'm just > missing something. > > Looking at this from the vfio-ccw angle, we can easily have the same > device (as identified by the device number) on different subchannels > (parents). To find out the device number, you need to look at the child > ccw device of the subchannel while it is *not* bound to vfio-ccw, but > to the normal I/O subchannel driver instead. Or ask your admin for the > system definition... This just means that whoever/whatever is invoking "virDomainDeviceDefinXML" or "mdevctl create" will pass different parameters on each host. When migrating a guest the mgmt app can indicate which device should be used for the guest on each host. This is similar issue to migrating a guest which uses a ethNNN device that's got different name on each host ,or a /dev/sdNNN that's different on each host, etc Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list