[BUG] VM stuck in interrupt-loop after suspend to/resumed from file, or no interrupts at all

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

libvirt implements a manages save, which suspens a VM to a file, from which it 
can be resumed later. This uses Qemus/Kvms "migrate exec:<file>" feature.
This doesn't work reliable for me: In may cases the resumed VM seems to be 
stuck: its VNC console is restored, but no key presses or network packages 
are accepted. This both happens with Windows XP, 7, 2008 and Linux 2.6.32 
systems.

Using the debugging cycle described below in more detail I was able to track 
the problem down to interrupt handling: Either the Linux-guest-kernel 
constantly receives an interrupt for the 8139cp network adapter, or no 
interrupts at all (neither network nor keyboard nor timer); only sending a 
NMI works and shows that at least the Linux-Kernel is still alive.

If I add the -no-kvm-irqchip Option, it seems to work; I was not able to 
reproduce a hang.

    * What cpu model (examples: Intel Core Duo, Intel Core 2 Duo, AMD Opteron 
2210). See /proc/cpuinfo if you're not sure.
Intel(R) Core(TM)2 Duo CPU     L9400  @ 1.86GHz
AMD Athlon(tm) II X2 250 Processor

    * What kvm version you are using. If you're using git directly, provide 
the output of 'git describe'. 
qemu-kvm_0.12.4+dfsg-1~bpo50+1.2.201007160916
qemu-kvm_0.13.0+dfsg-2
±<https://patchwork.kernel.org/patch/96650/>

    * The host kernel version 
linux-2.6.32.23
linux-2.6.37

    * What host kernel arch you are using (i386 or x86_64)
both i386(Intel) and x64_64(AMD)

    * What guest you are using, including OS type (Linux, Windows, Solaris, 
etc.), bitness (32 or 64), kernel version 
Linux 2.6.32 i686
Windows 2003 R2 i686
7_Ultimate amd64
XP_Professional_SP3 i686

    * The qemu command line you are using to start the guest 
see below

    * Whether the problem goes away if using the -no-kvm-irqchip 
or -no-kvm-pit switch. 
Yes, with -no-kvm-irqchip I was not able to reproduce the problem.

    * Whether the problem also appears with the -no-kvm switch. 
Did not test.


Not knowing much on how kvm/qemu internally works I would guess that the state 
of the PIC is not stored or restored right, since I would either get an 
interrupt storm or no interrupts anymore.
Is there anything more I can do to diagnose the problem?


There are similar reports, which might be related:
<https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/555981>
<http://www.mail-archive.com/qemu-devel@xxxxxxxxxx/msg49381.html>


Basically I was doing the following cycle:

DEV=tap0
sudo /usr/sbin/openvpn --mktun --dev "$DEV" --user "$USER"
sudo /etc/kvm/kvm-ifup "$DEV"
while true; do
	/usr/bin/kvm \
        -d int \
        -gdb tcp::1234 \
        -M pc-0.12 \
        -enable-kvm \
        -m 512 \
        -smp 1,sockets=1,cores=1,threads=1 \
        -name ucs-fv-qcow \
        -uuid 7a373b2a-89c1-6dfb-5fa5-c438230ebde1 \ 
        -nodefaults \
        -chardev stdio,id=monitor \
        -mon chardev=monitor,mode=readline \
        -rtc base=utc \
        -boot cd \
        -drive 
file=/var/lib/libvirt/images/ucs-fv-qcow.qcow2,if=none,id=drive-ide0-0-0,boot=on,format=qcow2 
\
        -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 \
        -drive 
file=/var/lib/libvirt/images/ucs_2.4-0-100829-dvd-i386.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw 
\
        -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 \
        -device 
rtl8139,vlan=0,id=net0,mac=52:54:00:68:f3:25,bus=pci.0,addr=0x3 \
        -net tap,vlan=0,name=hostnet0,ifname="$DEV",script=no \
        -usb \
        -sdl \
        -k de \
        -vga cirrus \
        -incoming exec:"dd if=/var/lib/libvirt/qemu/save/ucs-fv-qcow.save 
bs=4K skip=0 status=noxfer" \
        -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 <<__QEMU__
# set_link hostnet0 down
migrate_set_speed 4095M
migrate "exec:dd of=/var/lib/libvirt/qemu/save/ucs-fv-qcow.save bs=1M"
quit
__QEMU__

In a second console I used a remote-gdb to investigate the VM:
gdb --eval-command="target remote :1234" --eval-command="display/i 
\$pc" --eval-command="break *0xe0cf2102"
(the address is that of cp_interrupts())

To resolve the adresses I used a clone of the instance and resolved the 
symbols manually using /proc/kallsyms.

More info (in German) is in out bugtracker at 
<https://forge.univention.org/bugzilla/show_bug.cgi?id=21130>.

BYtE
Philipp
-- 
Philipp Hahn           Open Source Software Engineer      hahn@xxxxxxxxxxxxx   
Univention GmbH        Linux for Your Business        fon: +49 421 22 232- 0
Mary-Somerville-Str.1  28359 Bremen                   fax: +49 421 22 232-99
                                                    http://www.univention.de/

Attachment: signature.asc
Description: This is a digitally signed message part.


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux