Re: Matching the type of mediated devices in the migration

Zhi Wang <zhi.a.wang@xxxxxxxxx> · Sun, 19 Aug 2018 22:25:19 +0800

Share some updates of my work on this topic recently:

Thanks for Erik's guide and advices. Now my PoC patches almost works. 
Will send the RFC soon.

Mostly the ideas are based on Alex's idea: a match between a device 
state version and a minimum required version

"Match of versions" in Libvirt

Initialization stage:

- Libvirt would detect if there is any device state version in a 
"mdev_type" of a mediated device when creating a mdev node in node 
device tree.
	- If the "mdev_type" of a mediated device *has* a device state version, 
then this mediated device supports migration.
	- If not, (compatibility case, mostly for old vendor drivers which 
don't support migration), this mediated device doesn't support migration

Migration stage:

- Libvirt would put the mdev information inside cookies and send them 
between src machine and dst machine. So a new type of cookie would be 
added here.

There are different versions of migration protocols in libvirt. Each of 
them starts to send cookies in different sequence. The idea here is to 
let the match happens as early as possible. Looks like QEMU driver in 
libvirt only support V2/V3 proto.

V2 proto:

- The match would happen in SRC machine after the DST machine transfers 
the cookies with mdev information back to the SRC machine during the 
"preparation" stage. The disadvantage is the DST virtual machine has 
already been created in "preparation" stage. If the match fails, the 
virtual machine in DST machine has to be killed as well, which would 
waste some time.

V3 proto:

- The match would happen in DST machine after the SRC machine transfers 
the cookies to the DST machine during the "begin" stage. As the DST 
machine hasn't entered into "preparation" stage at this time, the 
virtual machine hasn't been created in DST machine at this point. No 
extra VM destroy is needed if the match fails. This would be the ideal 
place for a match.

"Match of version" in QEMU level

As there are several different types of migration in libvirt. In a 
migration with hypervisor native transport, the target machine could 
even not have libvirtd, the migration happens between device models 
directly. So we need a match in QEMU level as well. We might still need 
Kirti's approach as the last level match.

Thanks,
Zhi.

On 08/11/18 05:28, Zhi Wang wrote:
Hi Alex and Kirti:

Thanks for your reply and discussion. :)  Sorry for my late reply since 
there quite some work and email needs to be caught up after my vacation.

 From my point of view,  failing the migration because of the mismatch 
of version in different levels provides different pros/cons.

- Match version in userspace toolkit level, like in QEMU and Libvirt:

Pros: Better responsiveness since the match of the version would be 
figured out before actually suspend/resume devices. All the userspace 
toolkit could provide these information to UI or other management tool, 
like virtsh and virt manager, so it would be helpful for the 
administrator to know what's happening through the management interface.

Cons: Vendor driver has to expose the version information. Some vendor 
driver might not wish to expose that explicitly. Considering the mdev 
could be highly related to different vendors and different devices, this 
might happen in future as well.

- Match version in device state level (vendor-specific)

Pros: The vendor driver doesn't need to explain and expose the a 
explicit version of device state.

Cons: Waste of bandwidth. Bad responsiveness and informative.

How about we combine the two ideas together? The vendor driver could 
decide to use the device state or not. But still, the error information 
could be a problem since it's could be hard for the management tool like 
virtsh or virt-manager to get a error message from a remote node.

Let me cook some RFC patch in the next week.

Have a great weekend. :)

Thanks,
Zhi.

-----Original Message-----
From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] Sent: Monday, 
August 6, 2018 10:22 PM
To: Kirti Wankhede <kwankhede@xxxxxxxxxx>
Cc: Wang, Zhi A <zhi.a.wang@xxxxxxxxx>; libvir-list@xxxxxxxxxx
Subject: Re: Matching the type of mediated devices in the migration

On Mon, 6 Aug 2018 23:45:21 +0530
Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:

On 8/3/2018 11:26 PM, Alex Williamson wrote:
> On Fri, 3 Aug 2018 12:07:58 +0000
> "Wang, Zhi A" <zhi.a.wang@xxxxxxxxx> wrote:
> >> Hi:
>>
>> Thanks for unfolding your idea. The picture is clearer to me now. I 
didn't realize that you also want to support cross hardware migration. 
Well, I thought for a while, the cross hardware migration might be not 
popular in vGPU case but could be quite popular in other mdev cases. > 
> Exactly, we need to think beyond the implementation for a specific > 
vendor or class of device.
> >> Let me continue my summary:
>>
>> Mdev dev type has already included a parent driver name/a group 
name/physical device version/configuration type. For example 
i915-GVTg_V5_4. The driver name and the group name could already 
distinguish the vendor and the product between different mdevs, e.g. 
between Intel and Nvidia, between vGPU or vOther. > > Note that there 
are only two identifiers here, a vendor driver and a > type.  We 
included the vendor driver to avoid namespace collisions > between 
vendors.  The type itself should be considered opaque > regardless of 
how a specific vendor makes use of it.
> >> Each device provides a collection of the version of device state 
of data stream in a preferred order in a mdev type, as newer version 
of device state might contains more information which might help on 
performances. >>
>> Let's say a new device N and an old device O, they both support 
mdev_type M.
>>
>> For example:
>> Device N is newer and supports the versions of device state: [ 6.3 
>> 6.2 .6.1 ] in mdev type M Device O is older and supports the >> 
versions of device state: [ 5.3 5.2 5.1 ] in mdev type M
>>
>> - Version scheme of device state in backwards compatibility case: 
Migrate a VM from a VM with device O to a VM with device N, the mdev 
type is M.
>>
>> Device N: [ 6.3 6.2 6.1 5.3 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M
>> Version used in migration: 5.3
>> The new device directly supports mdev_type M with the preferred 
version on Device O. Good, best situation.
>>
>> Device N: [ 6.3 6.2 6.1 5.2 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M
>> Version used in migration: 5.2
>> The new device supports mdev_type M, but not the preferred version. 
After the migration, the vendor driver might have to disable some 
features which is not mentioned in 5.2 device state. But this totally 
depends on the vendor driver. If user wish to achieve the best 
experience, he should update the vendor driver in device N, which 
supports the preferred version on device O.
>>
>> Device N: [ 6.3 6.2 6.1 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M
>> Version used in migration: None
>> No version is matched. Migration would fail. User should update the 
vendor driver on device N and device O.
>>
>> - Version scheme of device state in forwards compatibility case: 
Migrate a VM from a VM with N to a VM with device O, the mdev type is M.
>>
>> Device N: [ 6.3 6.2 .6.1 ] in M
>> Device O: [ 5.3 5.2 5.1 ] in M, but the user updates the vendor >> 
driver on device O. Now device O could support [ 5.3 5.2 5.1 6.1 ] >> 
(As an old device, the Device O still prefers version 5.3) Version 
used in migration: 6.1 As the new device states is going to migrate to 
an old device, the vendor driver on old device might have to specially 
dealing with the new version of device state. It depends on the vendor 
driver.
>>
>> - QEMU has to figure out and choose the version of device states >> 
before reading device state from the region. (Perhaps we can put >> 
the option of selection in the control part of the region as well)
>> - Libvirt will check if there is any match of the version in the 
collection in device O and device N before migration.
>> - Each mdev_type has its own collection of versions. (Device can >> 
support different versions in different types)
>> - Better the collection is not a range, better they could be a >> 
collection of the version strings. (The vendor driver might drop >> 
some versions during the upgrade since they are not ideal)
> > I believe that QEMU has always avoided trying to negotiate a > 
migration version.  We can only negotiate if the target is online > 
and since a save/restore is essentially an offline migration, > 
there's no opportunity for negotiation.  Therefore I think we need > 
to assume the source version is fixed.  If we need to expose an > 
older migration interface, I think we'd need to consider > 
instantiating the mdev with that specification or configuring it via > 
attributes before usage, just like QEMU does with specifying a machine 
type version.
> > Providing an explicit list of compatible versions also seems like 
it > could quickly get out of hand, imagine a driver with regular > 
releases that maintains compatibility for years.  The list could get > 
unmanageable.
> > To be honest, I'm pretty dubious whether vendors will actually > 
implement cross version migration, or really consider migration > 
compatibility at all, which is why I think we need to impose migration 
compatibility with
> this sort of interface.
Vendor driver can implement cross version migration support, may not 
be cross major version but cross minor version migration support can 
be implemented.

Of course, but I think we need to consider this an opt-in for the 
vendor, the default should be identical version only unless the vendor 
driver states otherwise.

In case of live migration, if vendor driver returns failure at 
destination during its resume phase, then VM at source is resumed and 
it continues to run at source, right? Please correct me if my 
understanding is wrong. Then in case of Live migration, vendor driver 
can add binary blob of compatibility details which vendor driver 
understands as first binary blob and at destination while resuming the 
first step is to check compatibility and return accordingly. If vendor 
driver finds its not compatible then fail resume at destination with 
proper error message in syslog.

While this is true, the device state is the final component of 
migration, so you're basically asking your users to try it to see if it 
works, and if it doesn't work, apparently it's not supported, or maybe 
something else is broken.  Not only is that a poor user experience, but 
it potentially consumes massive amounts of bandwidth, resources, incurs 
downtime in the VM, and it makes it difficult for management tools to 
predict where a VM can be successfully migrated.

In case of save/restore same logic can be applied and resume can fail 
if vendor version is not compatible with the version when VM was saved.

So again, the user and management tool experience is to hope for the 
best and assume unsupported if it doesn't work?  We can do better.
Rather than embedding version information into the binary blob part of 
the migration stream, shouldn't it be exposed as a standard parsed field 
such that it can be included in the migration stream and introspected 
later for compatibility with the host driver?

> A vendor that doesn't want to support cross version migration can > 
simply increment the version and provide no minimum version, without > 
at least that, I think we're gambling for breaking devices and > 
systems in interesting and unpredictable ways.

If vendor driver doesn't want to support cross version migration then 
they can just have version string in first binary blob and check if 
its equal or not.

Then libvirt doesn't have to worry about vendor driver version. 
Libvirt only need to verify that mdev type at source is creatable at 
destination.

As outlined above, failing at device restore is a poor solution, it's a 
last resort.  We need to think about supportability.  Assuming that a 
vendor driver has taken migration compatibility into account is not 
supportable.  Embedding version information into the binary blob part of 
the device migration stream is not supportable.  I want to be able to 
file bugs with vendors with meaningful information about the source 
stream and target driver with clear expectations of what should and 
should not work, not shrug my shoulders and randomly try another host.

When Libvirt creates mdev type at destination, will mdev's UUID at 
source and destination be same?

There's no reason it needs to be from an mdev or QEMU perspective.
Thanks,

Alex

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list