On Mon, Jul 13, 2020 at 14:04:25 +0200, Jiri Denemark wrote: > On Sat, Jul 11, 2020 at 13:44:19 -0400, Mark Mielke wrote: > > On Sat, Jul 11, 2020 at 6:04 AM Mark Mielke <mark.mielke@xxxxxxxxx> wrote: > > > > > On Fri, Jul 10, 2020 at 7:48 AM Mark Mielke <mark.mielke@xxxxxxxxx> wrote: > > > > > >> On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark <jdenemar@xxxxxxxxxx> > > >> wrote: > > >> > > >>> The implementation seems to be doing exactly what the commit message > > >>> > > >> says. The migratable=off default should be used only when QEMU does not > > >>> support -cpu host,migratable=on|off, that is only when QEMU is very old. > > >>> Every non-ancient version of libvirt should have the > > >>> QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose > > >>> migrateble=on default. > > >>> > > >> QEMU_CAPS_CPU_MIGRATABLE only from the <cpu> element? If so, doesn't this > > >> mean that it is not explicitly listed for host-passthrough, and this means > > >> the check is not detecting whether it is enabled or not properly? > > >> > > > Trying to understand what is going on more - I see "migratable" seems to > > > be ok when launching a new machine, but the failure scenario was live > > > migration from 6.4.0 to 6.5.0. > > > > > > Is this because the QEMU_CAPS_CPU_MIGRATABLE was not filled in for 6.4.0, > > > and live migration grabs the capabilities from the source, where the > > > absence of this capability makes it presume an older Qemu in the above code? > > > > > > > Sorry all - I am having trouble reproducing now. The expected use cases are > > now working. > > > > Is it possible that the "migratable" flag might have been missing on some > > of the instances, although migration worked fine, and despite having used > > Qemu 4.2 and Qemu 5.0? > > When an updated libvirtd which knows about this new capability starts, > it would reprobe all QEMU capabilities (lazily, i.e., once they are > needed). However, if there is a running domain, libvirt will use cached > capabilities probed when the domain was started. I suspect migrating > such domain could be a problem. I'll try to reproduce locally. OK, I did not reproduce the failure, because migratable=off doesn't enable anything more than migratable=on (likely because L1 VM in my nested environment does not have any non-migratable features enabled). But I was able to reproduce the issue itself and the migration could clearly fail if migratable=off enabled some non-migratable features. The reproducer is actually easy and one doesn't even need to migrate to see libvirt did something wrong: 1. run libvirtd older then 6.5.0 2. start a domain with host-passthrough CPU (QEMU would default to migratable=on) 3. upgrade libvirt to 6.5.0 and restart libvirtd 4. virsh dumpxml $DOMAIN_STARTED_IN_STEP_2 Now you would see <cpu mode='host-passthrough' check='none' migratable='off'/> which differs from the default used by QEMU. Migrating such domain would succeed anyway, because it was actually started with migratable='on'. But when such domain is migrated to libvirt 6.5.0, we would honor the migratable attribute and start QEMU with -cpu host,migratable=off which could cause failures when trying to migrate this domain again. The problem is exactly where I was afraid it could be. When libvirtd starts, it reads the QEMU capabilities probed by older libvirt (QEMU_CAPS_CPU_MIGRATABLE would be off) and wrongly updates the XML of the running domain. I'll prepare a patch to fix this. Jirka