Somewhere between kernel 3.2 and 3.11 on my VM hosts (yes, I know that narrows it down a /whole lot/ ...), live migration started killing my Ubuntu precise (kernel 3.2.x) guests, causing all of their vcpus to go into a busy loop. Once (and only once) I've observed the guest eventually becoming responsive again, with a clock nearly 600 years in the future and a negative uptime. I haven't been able to dig up any previous threads about this problem, so my gut instinct is that I've configured something wonky. Any pointers toward /what/ I may have done wrong are appreciated. It only seems to happen if I've given the guests Nehalem-class CPU features. My longest-running VMs, from before I started passing-through the CPU capabilities into the guest, seem to migrate without issue. It also seems to happen reliably when the guest has been running for a while; it's easily reproducible with guests that have been up ~1 day, and I've reproduced it in VMs with an uptime of ~20 hours. I haven't yet figured out a lower-bound, which makes the testing cycle a little longer for me. The guests that I reliably reproduce this on are Ubuntu 12.04 guests running the current 3.2 kernel that Canonical distributes. Recent Fedora kernels (3.14+, IIRC) don't seem to busy-spin this way, though I haven't tested this case exhaustively, and I haven't written down very good notes for the tests I have done with Fedora. The hosts are dual-socket Nehalem Xeons (L5520), currently running Ubuntu 14.04 and the associated 3.13 kernel. I had previously reproduced this with 12.04 running a raring-backport 3.11 kernel as well, but I (seemingly erroneously) assumed it may have been a qemu userspace discrepancy. I have been poring through a debugger attached to the guest via qemu's gdbserver after it gets sent in a busy-spin, and the stack trace is: (gdb) bt #0 second_overflow (secs=<optimized out>) at /build/buildd/linux-3.2.0/kernel/time/ntp.c:407 #1 0xffffffff81095c75 in logarithmic_accumulation (offset=3831765322649889943, shift=9) at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:987 #2 0xffffffff81096042 in update_wall_time () at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:1056 #3 0xffffffff81096e8d in do_timer (ticks=549606) at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:1246 #4 0xffffffff8109d825 in tick_do_update_jiffies64 (now=...) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:77 #5 0xffffffff8109dda6 in tick_nohz_update_jiffies (now=...) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:145 #6 0xffffffff8109e378 in tick_check_nohz (cpu=0) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:713 #7 tick_check_idle (cpu=0) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:731 #8 0xffffffff8106ff91 in irq_enter () at /build/buildd/linux-3.2.0/kernel/softirq.c:306 #9 0xffffffff8166cef3 in smp_apic_timer_interrupt (regs=<optimized out>) at /build/buildd/linux-3.2.0/arch/x86/kernel/apic/apic.c:880 #10 <signal handler called> #11 0xffffffffffffff10 in ?? () (gdb) thread 2 [Switching to thread 2 (Thread 2)] #0 read_seqbegin (sl=<optimized out>) at /build/buildd/linux-3.2.0/include/linux/seqlock.h:89 89 /build/buildd/linux-3.2.0/include/linux/seqlock.h: No such file or directory. (gdb) bt #0 read_seqbegin (sl=<optimized out>) at /build/buildd/linux-3.2.0/include/linux/seqlock.h:89 #1 ktime_get () at /build/buildd/linux-3.2.0/kernel/time/timekeeping.c:268 #2 0xffffffff8109e355 in tick_check_nohz (cpu=1) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:709 #3 tick_check_idle (cpu=1) at /build/buildd/linux-3.2.0/kernel/time/tick-sched.c:731 #4 0xffffffff8106ff91 in irq_enter () at /build/buildd/linux-3.2.0/kernel/softirq.c:306 #5 0xffffffff8166cef3 in smp_apic_timer_interrupt (regs=<optimized out>) at /build/buildd/linux-3.2.0/arch/x86/kernel/apic/apic.c:880 #6 <signal handler called> #7 0xffffffffffffff10 in ?? () If I continue, then re-stop the guest, logarithmic_accumulation() is still in the stacktrace, with the same offset and shift; the line numbers indicate it's stuck in the following loop: while (timekeeper.xtime_nsec >= nsecps) { int leap; timekeeper.xtime_nsec -= nsecps; xtime.tv_sec++; leap = second_overflow(xtime.tv_sec); xtime.tv_sec += leap; wall_to_monotonic.tv_sec -= leap; if (leap) clock_was_set_delayed(); } Live migration is initiated through libvirt by virDomainMigrate with flags=VIR_MIGRATE_LIVE, uri="tcp://$recv_hostname". The guest is spawned by libvirtd with: qemu-system-x86_64 -enable-kvm -name dog -S -machine pc-i440fx-trusty,accel=kvm,usb=off -cpu Nehalem,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 512 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 55fd4c19-2477-40a5-988f-aaccd60b20dc -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/dog.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -drive file=rbd:rbd/dog:id=libvirt:key=________________________________________:auth_supported=cephx\;none,if=none,id=drive-virtio-disk0,format=raw,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2 -netdev tap,ifname=vm9_0,script=no,id=hostnet0,vhost=on,vhostfd=26 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:16:3e:62:7a:9d,bus=pci.0,addr=0x3 -vnc 0.0.0.0:9,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming tcp:[::]:49152 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 The libvirt domain XML is: <domain type='kvm' id='12'> <name>dog</name> <uuid>55fd4c19-2477-40a5-988f-aaccd60b20dc</uuid> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> <vcpu placement='static'>2</vcpu> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type> <bootmenu enable='yes'/> </os> <features> <acpi/> </features> <cpu mode='custom' match='exact'> <model fallback='allow'>Nehalem</model> <feature policy='require' name='dca'/> <feature policy='require' name='xtpr'/> <feature policy='require' name='tm2'/> <feature policy='require' name='est'/> <feature policy='require' name='vmx'/> <feature policy='require' name='ds_cpl'/> <feature policy='require' name='monitor'/> <feature policy='require' name='pbe'/> <feature policy='require' name='tm'/> <feature policy='require' name='ht'/> <feature policy='require' name='ss'/> <feature policy='require' name='acpi'/> <feature policy='require' name='ds'/> <feature policy='require' name='vme'/> </cpu> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/bin/kvm-spice</emulator> <disk type='file' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hdc' bus='ide'/> <readonly/> <boot order='1'/> <alias name='ide0-1-0'/> <address type='drive' controller='0' bus='1' target='0' unit='0'/> </disk> <disk type='network' device='disk' snapshot='no'> <driver name='qemu' type='raw' cache='none'/> <auth username='libvirt'> <secret type='ceph' uuid='e04aa789-0bd7-07ac-cf10-78d8f52a4162'/> </auth> <source protocol='rbd' name='rbd/dog'/> <target dev='vda' bus='virtio'/> <boot order='2'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='ide' index='0'> <alias name='ide0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <controller type='usb' index='0'> <alias name='usb0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <interface type='ethernet'> <mac address='00:16:3e:62:7a:9d'/> <script path='no'/> <target dev='vm9_0'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <input type='mouse' bus='ps2'/> <input type='keyboard' bus='ps2'/> <graphics type='vnc' port='5909' autoport='no' listen='0.0.0.0'> <listen type='address' address='0.0.0.0'/> </graphics> <video> <model type='cirrus' vram='9216' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </memballoon> </devices> <seclabel type='none'/> </domain> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html