On 2024/08/01 11:28, Jason Wang wrote:
On Wed, Jul 31, 2024 at 8:58 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
On Wed, Jul 31, 2024 at 03:41:00AM -0400, Michael S. Tsirkin wrote:
On Wed, Jul 31, 2024 at 08:04:24AM +0100, Daniel P. Berrangé wrote:
On Tue, Jul 30, 2024 at 05:32:48PM -0400, Michael S. Tsirkin wrote:
On Tue, Jul 30, 2024 at 04:03:53PM -0400, Peter Xu wrote:
On Tue, Jul 30, 2024 at 03:22:50PM -0400, Michael S. Tsirkin wrote:
This is not what we did historically. Why should we start now?
It's a matter of whether we still want migration to randomly fail, like
what this patch does.
Or any better suggestions? I'm definitely open to that.
Thanks,
--
Peter Xu
Randomly is an overstatement. You need to switch between kernels
where this feature differs. We did it with a ton of features
in the past, donnu why we single out USO now.
This has been a problem with a ton of features in the past. We've
ignored the problem, but that doesn't make it the right solution
With regards,
Daniel
Pushing it to domain xml does not really help,
migration will still fail unexpectedly (after wasting
a ton of resources copying memory, and getting
a downtime bump, I might add).
Could you elaborate why it would fail if with what I proposed?
Note that if this is a generic comment about "any migration can fail if we
found a device mismatch", we have plan to fix that to some degree. It's
just that we don't have enough people working on these topics yet. See:
https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake
It includes:
"Check device tree on both sides, etc., to make sure the migration is
applicable. E.g., we should fail early and clearly on any device
mismatch."
However I don't think it'll cover all checks, e.g. I _think_ even if we
verify VMSDs then post_load() hooks can still fail, and there can be some
corner cases to think. And of course, this may not even apply to virtio
since virtio manages migration itself, without providing a top-level vmsd.
The right solution is to have a tool that can query
backends, and that given the results from all of the cluster,
generate a set of parameters that will ensure migration works.
This seems to be very hard for vhost-users.
Can you elaborate more? I was thinking something like follows:
1. Prepare a QEMU command line.
2. Run the command line appended with -dump-platform on all hosts, which
dumps platform features automatically enabled. For virtio devices, we
can dump "host_features" variable.
3. Run the command line appended with -merge-platform with all dumps.
For most virtio devices, this would be AND operations on "host_features"
variable.
4. Run the command line appended with -use-platform with the merged
dump. This will run VMs with features available on all hosts.
I may have missed something but this seems good enough for me. Of course
this requires changes throughout the stack (QEMU common and
device-specific code, libvirt, and even higher layers like OpenStack).
Regards,
Akihiko Odaki