* Alex Williamson (alex.williamson@xxxxxxxxxx) wrote: > On Mon, 27 Jul 2020 15:24:40 +0800 > Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > > > > As you indicate, the vendor driver is responsible for checking version > > > > information embedded within the migration stream. Therefore a > > > > migration should fail early if the devices are incompatible. Is it > > > but as I know, currently in VFIO migration protocol, we have no way to > > > get vendor specific compatibility checking string in migration setup stage > > > (i.e. .save_setup stage) before the device is set to _SAVING state. > > > In this way, for devices who does not save device data in precopy stage, > > > the migration compatibility checking is as late as in stop-and-copy > > > stage, which is too late. > > > do you think we need to add the getting/checking of vendor specific > > > compatibility string early in save_setup stage? > > > > > hi Alex, > > after an offline discussion with Kevin, I realized that it may not be a > > problem if migration compatibility check in vendor driver occurs late in > > stop-and-copy phase for some devices, because if we report device > > compatibility attributes clearly in an interface, the chances for > > libvirt/openstack to make a wrong decision is little. > > I think it would be wise for a vendor driver to implement a pre-copy > phase, even if only to send version information and verify it at the > target. Deciding you have no device state to send during pre-copy does > not mean your vendor driver needs to opt-out of the pre-copy phase > entirely. Please also note that pre-copy is at the user's discretion, > we've defined that we can enter stop-and-copy at any point, including > without a pre-copy phase, so I would recommend that vendor drivers > validate compatibility at the start of both the pre-copy and the > stop-and-copy phases. That's quite curious; from a migration point of view I'd expect if you did want to skip pre-copy, that you'd go through the motions of entering it and then not saving any data and then going to stop-and-copy, rather than having two flows. Note that failing at a late stage of stop-and-copy is a pain; if you've just spent an hour migrating your huge busy VM over, you're going to be pretty annoyed when it goes pop near the end. Dave > > so, do you think we are now arriving at an agreement that we'll give up > > the read-and-test scheme and start to defining one interface (perhaps in > > json format), from which libvirt/openstack is able to parse and find out > > compatibility list of a source mdev/physical device? > > Based on the feedback we've received, the previously proposed interface > is not viable. I think there's agreement that the user needs to be > able to parse and interpret the version information. Using json seems > viable, but I don't know if it's the best option. Is there any > precedent of markup strings returned via sysfs we could follow? > > Your idea of having both a "self" object and an array of "compatible" > objects is perhaps something we can build on, but we must not assume > PCI devices at the root level of the object. Providing both the > mdev-type and the driver is a bit redundant, since the former includes > the latter. We can't have vendor specific versioning schemes though, > ie. gvt-version. We need to agree on a common scheme and decide which > fields the version is relative to, ex. just the mdev type? > > I had also proposed fields that provide information to create a > compatible type, for example to create a type_x2 device from a type_x1 > mdev type, they need to know to apply an aggregation attribute. If we > need to explicitly list every aggregation value and the resulting type, > I think we run aground of what aggregation was trying to avoid anyway, > so we might need to pick a language that defines variable substitution > or some kind of tagging. For example if we could define ${aggr} as an > integer within a specified range, then we might be able to define a type > relative to that value (type_x${aggr}) which requires an aggregation > attribute using the same value. I dunno, just spit balling. Thanks, > > Alex -- Dr. David Alan Gilbert / dgilbert@xxxxxxxxxx / Manchester, UK