On Tue, Jan 27, 2015 at 5:55 AM, Mikhail Sennikovskii <mikhail.sennikovskii@xxxxxxxxxxxxxxxx> wrote: > Hi all, > > I've posted the bolow mail to the qemu-dev mailing list, but I've got no > response there. > That's why I decided to re-post it here as well, and besides that I think > this could be a kvm-specific issue as well. > > Some additional thing to note: > I can reproduce the issue on my Debian 7 with 3.16.0-0.bpo.4-amd64 kernel as > well. > I would typically use a max_downtime adjusted to 1 second instead of default > 30 ms. > I also noticed that the issue happens much more rarelly if I increase the > migration bandwidth, i.e. like > > diff --git a/migration.c b/migration.c > index 26f4b65..d2e3b39 100644 > --- a/migration.c > +++ b/migration.c > @@ -36,7 +36,7 @@ enum { > MIG_STATE_COMPLETED, > }; > > -#define MAX_THROTTLE (32 << 20) /* Migration speed throttling */ > +#define MAX_THROTTLE (90 << 20) /* Migration speed throttling */ > > Like I said below, I would be glad to provide you with any additional > information. > > Thanks, > Mikhail > Hi, Mikhail, So if you choose to use one vcpu, instead of smp, this issue would not happen, right? -Jidong > On 23.01.2015 15:03, Mikhail Sennikovskii wrote: >> >> Hi all, >> >> I'm running a slitely modified migration over tcp test in virt-test, which >> does a migration from one "smp=2" VM to another on the same host over TCP, >> and exposes some dummy CPU load inside the GUEST while migration, and >> after a series of runs I'm alwais getting a CLOCK_WATCHDOG_TIMEOUT BSOD >> inside the guest, >> which happens when >> " >> An expected clock interrupt was not received on a secondary processor in >> an >> MP system within the allocated interval. This indicates that the specified >> processor is hung and not processing interrupts. >> " >> >> This seems to happen with any qemu version I've tested (1.2 and above, >> including upstream), >> and I was testing it with 3.13.0-44-generic kernel on my Ubuntu 14.04.1 >> LTS with SMP4 host, as well as on 3.12.26-1 kernel with Debian 6 with SMP6 >> host. >> >> One thing I noticed is that exposing a dummy CPU load on the HOST (like >> running multiple instances of the "while true; do false; done" script) in >> parallel with doing migration makes the issue to be quite easily >> reproducible. >> >> >> Looking inside the windows crash dump, the second CPU is just running at >> IRQL 0, and it aparently not hung, as Windows is able to save its state in >> the crash dump correctly, which assumes running some code on it. >> So this aparently seems to be some timing issue (like host scheduler does >> not schedule the thread executing secondary CPU's code in time). >> >> Could you give me some insight on this, i.e. is there a way to customize >> QEMU/KVM to avoid such issue? >> >> If you think this might be a qemu/kvm issue, I can provide you any info, >> like windows crash dumps, or the test-case to reproduce this. >> >> >> qemu is started as: >> >> from-VM: >> >> qemu-system-x86_64 \ >> -S \ >> -name 'virt-tests-vm1' \ >> -sandbox off \ >> -M pc-1.0 \ >> -nodefaults \ >> -vga std \ >> -chardev >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112624-aFZmIkNT,server,nowait >> \ >> -mon chardev=qmp_id_qmp1,mode=control \ >> -chardev >> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112624-aFZmIkNT,server,nowait >> \ >> -device isa-serial,chardev=serial_id_serial0 \ >> -chardev >> socket,id=seabioslog_id_20150123-112624-aFZmIkNT,path=/tmp/seabios-20150123-112624-aFZmIkNT,server,nowait >> \ >> -device >> isa-debugcon,chardev=seabioslog_id_20150123-112624-aFZmIkNT,iobase=0x402 \ >> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ >> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ >> -device >> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ >> -device >> virtio-net-pci,mac=9a:74:75:76:77:78,id=idFdaC4M,vectors=4,netdev=idKFZNXH,bus=pci.0,addr=05 >> \ >> -netdev >> user,id=idKFZNXH,hostfwd=tcp::5000-:22,hostfwd=tcp::5001-:10023 \ >> -m 2G \ >> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ >> -cpu phenom \ >> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ >> -vnc :0 \ >> -rtc base=localtime,clock=host,driftfix=none \ >> -boot order=cdn,once=c,menu=off \ >> -enable-kvm >> >> to-VM: >> >> qemu-system-x86_64 \ >> -S \ >> -name 'virt-tests-vm1' \ >> -sandbox off \ >> -M pc-1.0 \ >> -nodefaults \ >> -vga std \ >> -chardev >> socket,id=qmp_id_qmp1,path=/tmp/monitor-qmp1-20150123-112750-VehjvEqK,server,nowait >> \ >> -mon chardev=qmp_id_qmp1,mode=control \ >> -chardev >> socket,id=serial_id_serial0,path=/tmp/serial-serial0-20150123-112750-VehjvEqK,server,nowait >> \ >> -device isa-serial,chardev=serial_id_serial0 \ >> -chardev >> socket,id=seabioslog_id_20150123-112750-VehjvEqK,path=/tmp/seabios-20150123-112750-VehjvEqK,server,nowait >> \ >> -device >> isa-debugcon,chardev=seabioslog_id_20150123-112750-VehjvEqK,iobase=0x402 \ >> -device ich9-usb-uhci1,id=usb1,bus=pci.0,addr=03 \ >> -drive id=drive_image1,if=none,file=/path/to/image.qcow2 \ >> -device >> virtio-blk-pci,id=image1,drive=drive_image1,bootindex=0,bus=pci.0,addr=04 \ >> -device >> virtio-net-pci,mac=9a:74:75:76:77:78,id=idI46M9C,vectors=4,netdev=idl9vRQt,bus=pci.0,addr=05 >> \ >> -netdev >> user,id=idl9vRQt,hostfwd=tcp::5002-:22,hostfwd=tcp::5003-:10023 \ >> -m 2G \ >> -smp 2,maxcpus=2,cores=1,threads=1,sockets=2 \ >> -cpu phenom \ >> -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1 \ >> -vnc :1 \ >> -rtc base=localtime,clock=host,driftfix=none \ >> -boot order=cdn,once=c,menu=off \ >> -enable-kvm \ >> -incoming tcp:0:5200 >> >> >> Thanks, >> Mikhail > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html