Hi all,
I've posted the bolow mail to the qemu-dev mailing list, but I've got no
response there.
That's why I decided to re-post it here as well, and besides that I
think this could be a kvm-specific issue as well.
Some additional thing to note:
I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64
kernel as well.
I would typically use a max_downtime adjusted to 1 second instead of
default 30 ms.
I also noticed that the issue happens much more rarelly if I increase
the migration bandwidth, i.e. like
diff --git a/migration.c b/migration.c
index 26f4b65..d2e3b39 100644
--- a/migration.c
+++ b/migration.c
@@ -36,7 +36,7 @@ enum {
MIG_STATE_COMPLETED,
};
-#define MAX_THROTTLE (32 << 20) /* Migration speed throttling */
+#define MAX_THROTTLE (90 << 20) /* Migration speed throttling */
Like I said below, I would be glad to provide you with any additional
information.
Thanks,
Mikhail
On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
Hi all,
I'm running a slitely modified migration over tcp test in virt-test,
which does a migration from one "smp=2" VM to another on the same host
over TCP,
and exposes some dummy CPU load inside the GUEST while migration, and
after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT
BSOD inside the guest,
which happens when
"
An expected clock interrupt was not received on a secondary processor
in an
MP system within the allocated interval. This indicates that the
specified
processor is hung and not processing interrupts.
"
This seems to happen with any qemu version I've tested (1.2 and above,
including upstream),
and I was testing it with 3.13.0-44-generic kernel on my Ubuntu
14.04.1 LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian
6 with SMP6 host.
One thing I noticed is that exposing a dummy CPU load on the HOST
(like running multiple instances of the "while true; do false; done"
script) in parallel with doing migration makes the issue to be quite
easily reproducible.
Looking inside the windows crash dump, the second CPU is just running
at IRQL 0, and it aparently not hung, as Windows is able to save its
state in the crash dump correctly, which assumes running some code on it.
So this aparently seems to be some timing issue (like host scheduler
does not schedule the thread executing secondary CPU's code in time).
Could you give me some insight on this, i.e. is there a way to
customize QEMU/KVM to avoid such issue?
If you think this might be a qemu/kvm issue, I can provide you any
info, like windows crash dumps, or the test-case to reproduce this.
qemu is started as:
from-VM:
qemu-system-x86_64 \
-S \
-name 'virt-tests-vm1' \
-sandbox off \
-M pc-1.0 \
-nodefaults \
-vga std \
-chardev
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
\
-mon chardev=qmp_id_qmp1,mode=control \
-chardev
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
\
-device isa-serial,chardev=serial_id_serial0 \
-chardev
socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
\
-device
isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402
\
-device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
-device
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04
\
-device
virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
\
-netdev
user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \
-m 2G \
-smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \
-cpu phenom \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :0 \
-rtc base=localtime,clock=host,driftfix=none \
-boot order=cdn,once=c,menu=off \
-enable-kvm
to-VM:
qemu-system-x86_64 \
-S \
-name 'virt-tests-vm1' \
-sandbox off \
-M pc-1.0 \
-nodefaults \
-vga std \
-chardev
socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
\
-mon chardev=qmp_id_qmp1,mode=control \
-chardev
socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
\
-device isa-serial,chardev=serial_id_serial0 \
-chardev
socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait
\
-device
isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402
\
-device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
-drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
-device
virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04
\
-device
virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05
\
-netdev
user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023 \
-m 2G \
-smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \
-cpu phenom \
-device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \
-vnc :1 \
-rtc base=localtime,clock=host,driftfix=none \
-boot order=cdn,once=c,menu=off \
-enable-kvm \
-incoming tcp:0:5200
Thanks,
Mikhail
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html