Re: Windows 2008 Guest BSODS with CLOCK_WATCHDOG_TIMEOUT on VM migration

Jidong Xiao <jidong.xiao@xxxxxxxxx> · Tue, 27 Jan 2015 11:09:44 -0800

On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii
<mikhail.sennikovskii@xxxxxxxxxxxxxxxx> wrote:
> Hi all,
>
> I've posted the bolow mail to the qemu-dev mailing list, but I've got no
> response there.
> That's why I decided to re-post it here as well, and besides that I think
> this could be a kvm-specific issue as well.
>
> Some additional thing to note:
> I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as
> well.
> I would typically use a max_downtime adjusted to 1 second instead of default
> 30 ms.
> I also noticed that the issue happens much more rarelly if I increase the
> migration bandwidth, i.e. like
>
> diff --git a/migration.c b/migration.c
> index 26f4b65..d2e3b39 100644
> --- a/migration.c
> +++ b/migration.c
> @@ -36,7 +36,7 @@ enum {
>      MIG_STATE_COMPLETED,
>  };
>
> -#define MAX_THROTTLE  (32 << 20)      /* Migration speed throttling */
> +#define MAX_THROTTLE  (90 << 20)      /* Migration speed throttling */
>
> Like I said below, I would be glad to provide you with any additional
> information.
>
> Thanks,
> Mikhail
>
Hi, Mikhail,

So if you choose to use one vcpu, instead of smp, this issue would not
happen, right?

-Jidong

> On 23.01.2015 15:03, Mikhail Sennikovskii wrote:
>>
>> Hi all,
>>
>> I'm running a slitely modified migration over tcp test in virt-test, which
>> does a migration from one "smp=2" VM to another on the same host over TCP,
>> and exposes some dummy CPU load inside the GUEST while migration, and
>> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD
>> inside the guest,
>> which happens when
>> "
>> An expected clock interrupt was not received on a secondary processor in
>> an
>> MP system within the allocated interval. This indicates that the specified
>> processor is hung and not processing interrupts.
>> "
>>
>> This seems to happen with any qemu version I've tested (1.2 and above,
>> including upstream),
>> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1
>> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6
>> host.
>>
>> One thing I noticed is that exposing a dummy CPU load on the HOST (like
>> running multiple instances of the "while true; do false; done" script) in
>> parallel with doing migration makes the issue to be quite easily
>> reproducible.
>>
>>
>> Looking inside the windows crash dump, the second CPU is just running at
>> IRQL 0, and it aparently not hung, as Windows is able to save its state in
>> the crash dump correctly, which assumes running some code on it.
>> So this aparently seems to be some timing issue (like host scheduler does
>> not schedule the thread executing secondary CPU's code in time).
>>
>> Could you give me some insight on this, i.e. is there a way to customize
>> QEMU/KVM to avoid such issue?
>>
>> If you think this might be a qemu/kvm issue, I can provide you any info,
>> like windows crash dumps, or the test-case to reproduce this.
>>
>>
>> qemu is started as:
>>
>> from-VM:
>>
>> qemu-system-x86_64 \
>>     -S  \
>>     -name 'virt-tests-vm1'  \
>>     -sandbox off  \
>>     -M pc-1.0  \
>>     -nodefaults  \
>>     -vga std  \
>>     -chardev
>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait
>> \
>>     -mon chardev=qmp_id_qmp1,mode=control  \
>>     -chardev
>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait
>> \
>>     -device isa-serial,chardev=serial_id_serial0  \
>>     -chardev
>> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait
>> \
>>     -device
>> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \
>>     -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>>     -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>>     -device
>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>>     -device
>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05
>> \
>>     -netdev
>> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023  \
>>     -m 2G  \
>>     -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>>     -cpu phenom \
>>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>>     -vnc :0  \
>>     -rtc base=localtime,clock=host,driftfix=none  \
>>     -boot order=cdn,once=c,menu=off \
>>     -enable-kvm
>>
>> to-VM:
>>
>> qemu-system-x86_64 \
>>     -S  \
>>     -name 'virt-tests-vm1'  \
>>     -sandbox off  \
>>     -M pc-1.0  \
>>     -nodefaults  \
>>     -vga std  \
>>     -chardev
>> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait
>> \
>>     -mon chardev=qmp_id_qmp1,mode=control  \
>>     -chardev
>> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait
>> \
>>     -device isa-serial,chardev=serial_id_serial0  \
>>     -chardev
>> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait
>> \
>>     -device
>> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \
>>     -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \
>>     -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \
>>     -device
>> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \
>>     -device
>> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05
>> \
>>     -netdev
>> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023  \
>>     -m 2G  \
>>     -smp 2,maxcpus=2,cores=1,threads=1,sockets=2  \
>>     -cpu phenom \
>>     -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
>>     -vnc :1  \
>>     -rtc base=localtime,clock=host,driftfix=none  \
>>     -boot order=cdn,once=c,menu=off \
>>     -enable-kvm \
>>     -incoming tcp:0:5200
>>
>>
>> Thanks,
>> Mikhail
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html