On Fri, Jul 10, 2020 at 7:14 AM Jiri Denemark <jdenemar@xxxxxxxxxx> wrote:
On Sun, Jul 05, 2020 at 12:45:55 -0400, Mark Mielke wrote:
> With 6.4.0, live migration was working fine with Qemu 5.0. After trying out
> 6.5.0, migration broke with the following error:
>
> libvirt.libvirtError: internal error: unable to execute QEMU command
> 'migrate': State blocked by non-migratable CPU device (invtsc flag)
Could you please describe the reproducer steps? For example, was the
domain you're trying to migrate already running when you upgrade libvirt
or is it freshly started by the new libvirt?
The original case was:
1) Machine X running libvirt 6.4.0 + qemu 5.0
2) Machine Y running libvirt 6.5.0 + qemu 5.0
3) Live migration from X to Y works. Guest appears fine.
4) Upgrade Machine X from libvirt 6.4.0 to 6.5.0 and reboot.
5) Live migration from Y to X fails with the message shown.
In each case, live migration was done with OpenStack Train directing libvirt + qemu.
And it would be helpful to see the <cpu> element as shown by virsh
dumpxml before you try to start the domain as well as the QEMU command
line libvirt used to start the domain (in
/var/log/libvirt/qemu/$VM.log).
The <cpu> element looks like this:
<cpu mode='host-passthrough' check='none'>
<topology sockets='1' dies='1' cores='4' threads='2'/>
</cpu>
<topology sockets='1' dies='1' cores='4' threads='2'/>
</cpu>
The QEMU command line is very long, and includes details I would avoid publishing publicly unless you need them. The "-cpu" portion is just:
-cpu host
The QEMU command line itself is generated from libvirt, which is directed by OpenStack Train.
> commit 201bd5db639c063862b0c1b1abfab9a9a7c92591
> Author: Jiri Denemark <jdenemar@xxxxxxxxxx>
> Date: Tue Jun 2 15:34:07 2020 +0200
>
> qemu: Fill default value in //cpu/@migratable attribute
>
> Before QEMU introduced migratable CPU property, "-cpu host" included all
> features that could be enabled on the host, even those which would block
> migration. In other words, the default was equivalent to migratable=off.
> When the migratable property was introduced, the default changed to
> migratable=on. Let's record the default in domain XML.
>
> Signed-off-by: Jiri Denemark <jdenemar@xxxxxxxxxx>
> Reviewed-by: Michal Privoznik <mprivozn@xxxxxxxxxx>
>
> Before this change, qemu was still being launched with "-cpu host", which
> for any somewhat modern version of qemu, defaults to migratable=on. The
> above comment acknowledges this, however, the implementation chooses the
> pessimistic and ancient (and no longer applicable!) value of migratable=off:
>
> + if (qemuCaps &&
> + def->cpu->mode == VIR_CPU_MODE_HOST_PASSTHROUGH &&
> + !def->cpu->migratable) {
> + if (virQEMUCapsGet(qemuCaps, QEMU_CAPS_CPU_MIGRATABLE))
> + def->cpu->migratable = VIR_TRISTATE_SWITCH_ON;
>
> *+ else if (ARCH_IS_X86(def->os.arch))+
> def->cpu->migratable = VIR_TRISTATE_SWITCH_OFF;*
> + }
The implementation seems to be doing exactly what the commit message
says. The migratable=off default should be used only when QEMU does not
support -cpu host,migratable=on|off, that is only when QEMU is very old.
Every non-ancient version of libvirt should have the
QEMU_CAPS_CPU_MIGRATABLE set and thus this code should choose
migrateble=on default.
I wasn't sure what QEMU_CAPS_CPU_MIGRATABLE represents. I initially suspected what you are saying, but since it apparently did not work the way I expected, I then presumed it does not work the way I expected. :-)
Is QEMU_CAPS_CPU_MIGRATABLE only from the <cpu> element? If so, doesn't this mean that it is not explicitly listed for host-passthrough, and this means the check is not detecting whether it is enabled or not properly?
> I think it is not a requirement for "migratable=XXX" to be explicit in
> libvirt. However, if there is some reason I am unaware of, and it is
> important for libvirt to know, then I think it is important for libvirt to
> find out the authoritative state rather than guessing.
Explicit defaults are always better for two reasons: they are visible to
users and they don't silently change.
I think it can go either way. There is also convention over configuration as a competing principle. However, I also prefer explicit. Just, it needs to be correct, otherwise explicit can be very bad, as it seems in my case. :-)
Thanks,