Bugs item #2351676, was opened at 2008-11-26 17:59 Message generated for change (Comment added) made by jlokier You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2351676&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Chris Jones (c_jones) Assigned to: Nobody/Anonymous (nobody) Summary: Guests hang periodically on Ubuntu-8.10 Initial Comment: I'm seeing periodic hangs on my guests. I've been unable so far to find a trigger - they always boot fine, but after anywhere from 10 minutes to 24 hours they eventually hang completely. My setup: * AMD Athlon X2 4850e (2500 MHz dual core) * 4Gig memory * Ubuntu 8.10 server, 64-bit * KVMs tried: : kvm-72 (shipped with ubuntu) : kvm-79 (built myself, --patched-kernel option) * Kernels tried: : 2.6.27.7 (kernel.org, self built) : 2.6.27-7-server from Ubuntu 8.10 distribution In guests * Ubuntu 8.10 server, 64-bit (virtual machine install) * kernel 2.6.27-7-server from Ubuntu 8.10 I'm running the guests like: sudo /usr/local/bin/qemu-system-x86_64 \ -daemonize \ -no-kvm-irqchip \ -hda Imgs/ndev_root.img \ -m 1024 \ -cdrom ISOs/ubuntu-8.10-server-amd64.iso \ -vnc :4 \ -net nic,macaddr=DE:AD:BE:EF:04:04,model=e1000 \ -net tap,ifname=tap4,script=/home/chris/kvm/qemu-ifup.sh The problem does not happen if I use -no-kvm. I've tried some other options that have no effect: -no-kvm-pit -no-acpi The disk images are raw format. When the guests hang, I cannot ping them, and the vnc console us hung. The qemu monitor is still accessible, and the guests recover if I issue a system_reset command from the monitor. However, often, the console will not take keyboard after doing so. When the guest is hung, kvm_stat shows all 0s for the counters: efer_relo exits fpu_reloa halt_exit halt_wake host_stat hypercall +insn_emul insn_emul invlpg io_exits irq_exits irq_windo largepage +mmio_exit mmu_cache mmu_flood mmu_pde_z mmu_pte_u mmu_pte_w mmu_recyc +mmu_shado nmi_windo pf_fixed pf_guest remote_tl request_i signal_ex +tlb_flush > 0 0 0 0 0 0 0 +0 0 0 0 0 0 0 0 +0 0 0 0 0 0 0 0 +0 0 0 0 0 0 gdb shows two threads - both waiting: c(gdb) info threads 2 Thread 0x414f1950 (LWP 422) 0x00007f36f07a03e1 in sigtimedwait () from /lib/libc.so.6 1 Thread 0x7f36f1f306e0 (LWP 414) 0x00007f36f084b482 in select () from /lib/libc.so.6 (gdb) thread 1 [Switching to thread 1 (Thread 0x7f36f1f306e0 (LWP 414))]#0 0x00007f36f084b482 +in select () from /lib/libc.so.6 (gdb) bt #0 0x00007f36f084b482 in select () from /lib/libc.so.6 #1 0x00000000004094cb in main_loop_wait (timeout=0) at /home/chris/pkgs/kvm/kvm-79/qemu/vl.c:4719 #2 0x000000000050a7ea in kvm_main_loop () at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:619 #3 0x000000000040fafc in main (argc=<value optimized out>, argv=0x7ffff9f41948) at /home/chris/pkgs/kvm/kvm-79/qemu/vl.c:4871 (gdb) thread 2 [Switching to thread 2 (Thread 0x414f1950 (LWP 422))]#0 0x00007f36f07a03e1 in +sigtimedwait () from /lib/libc.so.6 (gdb) bt #0 0x00007f36f07a03e1 in sigtimedwait () from /lib/libc.so.6 #1 0x000000000050a560 in kvm_main_loop_wait (env=0xc319e0, timeout=0) at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:284 #2 0x000000000050aaf7 in ap_main_loop (_env=<value optimized out>) at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:425 #3 0x00007f36f11ba3ea in start_thread () from /lib/libpthread.so.0 #4 0x00007f36f0852c6d in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Any clues to help me resolve this would be much appreciated. ---------------------------------------------------------------------- Comment By: Jamie Lokier (jlokier) Date: 2009-09-09 11:09 Message: I should clarify that last comment, because some text went missing. When I wrote about lockups "within 20 minutes", X non-responsive, SSH non-responsive etc., that was using kvm-88 userspace, but the kvm modules which shipped with the distro kernel. When I was able to unload those modules (which had to wait until other users weren't using KVM), I tried the "out of tree" compiled modules which come with the kvm-88 source tree, and that's when I found I couldn't even manage a simple bit of writing to one of my disks without SCSI SENSE errors and I/O errors reaching the application (if using IDE), or with virto-blk I got stuck processes very quickly, which have some similarity to the stuckness that took much longer to arrive with the older modules. ---------------------------------------------------------------------- Comment By: Jamie Lokier (jlokier) Date: 2009-09-09 11:05 Message: Now I tried with the kvm-88 modules too, and it had one of these "freezes" within seconds - as soon as I do "dd if=/dev/zero of=/dev/vda bs=512 count=1" to a virtio-blk device... (Root device is IDE). ps shows the dd process stuck in sync_page. SSH, console and X are still working though. This isn't the same as lockups using the older modules, but it has an interesting similarity: the older modules and the current ones both get stuck in sync_page. I tried the same with an IDE device instead of virtio-blk, and the "dd" succeeded, but then an attempt to created a partition on the disk resulted in lots of SCSI (ATA) I/O errors in the guest's kernel log and an I/O error in the partitioning application. Something is quite amiss. It looks like either the "out of tree" modules that come shipped with kvm-88 aren't as backward-compatible as I'd hoped, or there was a latent bug (perhaps the reason for those firmer freezes after many minutes) which the new modules expose much more quickly. ---------------------------------------------------------------------- Comment By: Jamie Lokier (jlokier) Date: 2009-09-09 10:12 Message: I'm seeing similar lockups, with kvm-88 on a quad 64-bit Xeon host which is running Ubuntu Server 9.04, which is Ubuntu's 2.6.27-14-server kernel. I've been doing some installs into VMs and each time it freezes a few minutes in. The console is still visible over VNC (I can disconnect and reconnect, and get the picture back), but it doesn't respond to keypresses _except_ it does respond to control-alt-Sysreq in the guest. X is similarly non-responsive to keyboard and mouse input. SSH sessions hang, but ping is still responsive. As it's always the same guest kernel in my tests (I have a particular task to do...) that might be relevant. It's Ubuntu's i386 (32-bit) linux-image-2.6.28-15-generic. Suggestions to pin the host clock frequency aren't helping: this thing doesn't seem to have cpufreq at all. As this seems to happen mostly when I have lots of disk activity (software installs), I've tried all of virtio-blk, scsi and ide block drivers; the ide one seemed to survive longer, but still failed within 20 minutes. I've tried increasing the memory of the guest substantially (in case it's a guest bug driven by the amount of I/O), and I've tried "nolapic" in the guest kernel's command line to disable using the lapic as clock source (due to suggestions that it may be related to reading the time). The guest doesn't show any clock sources other than pit and lapic in /proc/timer_list. z-image, when you say 2.6.31 appears to fix it, do you mean 2.6.31 as the host or as the guest? ---------------------------------------------------------------------- Comment By: Teodor Milkov (z-image) Date: 2009-08-24 08:45 Message: With 2.6.31-rc6 it is running fine for almost 72 hours. Looks like the problem is gone in 2.6.31. ---------------------------------------------------------------------- Comment By: Teodor Milkov (z-image) Date: 2009-08-21 09:53 Message: With -no-kvm-pit it is running fine for almost 20 hours. Didn't survive that long without -no-kvm-pit. ---------------------------------------------------------------------- Comment By: Daniel Poelzleithner (poelzi) Date: 2009-08-20 16:20 Message: I'm still in investigation but I got new informations so far. There seem to be diffenerent issues that cause different crashes. - dynamic cpu throtteling on the host - oops due the paravirt kvm support in the guest. i got hit by http://bugzilla.kernel.org/show_bug.cgi?id=12405 and I'm now investigation if disableing highmem helps as someone suggested. don't know if this also affects 64bit guests, which seems to run more stable on other machines here. it helps to setup netconsole and let syslog-ng write it to a log file, so oopses can be logged nicely ---------------------------------------------------------------------- Comment By: Teodor Milkov (z-image) Date: 2009-08-20 14:12 Message: On a closer look qemu actually exited, but it was virt manager who held it's monitoring console. Here's full transcript of what happened in a shell session: gdb --args /usr/local/bin/qemu-system-x86_64 -S -M pc -m 2047 -smp 3 -name kvm2 -uuid 4f484293-7e31-2fb9-f2c8-246b5f87f301 -monitor stdio -boot c -drive file=/dev/vg0/kvm2,if=virtio,index=0,boot=on -serial none -parallel none -vnc 213.145.98.164:1 -k en-us GNU gdb (GDB) 6.8.50.20090628-cvs-debian Copyright (C) 2009 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/> (gdb) run Starting program: /usr/local/bin/qemu-system-x86_64 -S -M pc -m 2047 -smp 3 -name kvm2 -uuid 4f484293-7e31-2fb9-f2c8-246b5f87f301 -monitor stdio -boot c -drive file=/dev/vg0/kvm2,if=virtio,index=0,boot=on -serial none -parallel none -vnc 213.145.98.164:1 -k en-us [Thread debugging using libthread_db enabled] [New Thread 0xb7dafb90 (LWP 19769)] [New Thread 0xb75aab90 (LWP 19770)] [New Thread 0xb6da6b90 (LWP 19771)] QEMU 0.10.50 monitor - type 'help' for more information (qemu) c [New Thread 0x35a2db90 (LWP 19772)] [New Thread 0x3522cb90 (LWP 19773)] [New Thread 0x348ffb90 (LWP 19774)] [New Thread 0x340feb90 (LWP 19775)] [New Thread 0x338fdb90 (LWP 19776)] [New Thread 0x330fcb90 (LWP 19777)] [New Thread 0x328fbb90 (LWP 19778)] [New Thread 0x320fab90 (LWP 19779)] [New Thread 0x318f9b90 (LWP 19780)] [New Thread 0x310f8b90 (LWP 19781)] [New Thread 0x308f7b90 (LWP 19782)] [New Thread 0x300f6b90 (LWP 19783)] [New Thread 0x2f8f5b90 (LWP 19784)] [New Thread 0x2f0f4b90 (LWP 19785)] [New Thread 0x2e8f3b90 (LWP 19786)] [New Thread 0x2e0f2b90 (LWP 19787)] [New Thread 0x2d8f1b90 (LWP 19788)] [New Thread 0x2d0f0b90 (LWP 19789)] [New Thread 0x2c8efb90 (LWP 19790)] [New Thread 0x2c0eeb90 (LWP 19791)] [New Thread 0x2b8edb90 (LWP 19792)] [New Thread 0x2b0ecb90 (LWP 19793)] [New Thread 0x2a8ebb90 (LWP 19794)] [New Thread 0x2a0eab90 (LWP 19795)] [New Thread 0x298e9b90 (LWP 19796)] [New Thread 0x290e8b90 (LWP 19797)] [New Thread 0x288e7b90 (LWP 19798)] [New Thread 0x280e6b90 (LWP 19799)] [New Thread 0x278e5b90 (LWP 19800)] [New Thread 0x270e4b90 (LWP 19801)] [New Thread 0x268e3b90 (LWP 19802)] [Thread 0x2e0f2b90 (LWP 19787) exited] [Thread 0x338fdb90 (LWP 19776) exited] [Thread 0x2b8edb90 (LWP 19792) exited] [Thread 0x2e8f3b90 (LWP 19786) exited] [Thread 0x2f8f5b90 (LWP 19784) exited] [Thread 0x308f7b90 (LWP 19782) exited] [Thread 0x300f6b90 (LWP 19783) exited] [New Thread 0x300f6b90 (LWP 19808)] [New Thread 0x308f7b90 (LWP 19813)] [New Thread 0x2f8f5b90 (LWP 19814)] [New Thread 0x2e8f3b90 (LWP 19815)] [New Thread 0x2b8edb90 (LWP 19816)] [New Thread 0x260e2b90 (LWP 19817)] [New Thread 0x258e1b90 (LWP 19818)] [New Thread 0x250e0b90 (LWP 19819)] [New Thread 0x248dfb90 (LWP 19820)] [New Thread 0x240deb90 (LWP 19821)] [New Thread 0x236ffb90 (LWP 19822)] [New Thread 0x22cffb90 (LWP 19823)] [New Thread 0x224feb90 (LWP 19824)] [New Thread 0x21affb90 (LWP 19825)] [New Thread 0x212feb90 (LWP 19828)] [New Thread 0x20afdb90 (LWP 19829)] [New Thread 0x202fcb90 (LWP 19830)] [New Thread 0x1fafbb90 (LWP 19831)] kvm: unhandled exit 31 kvm_run returned -22 [Thread 0x2c0eeb90 (LWP 19791) exited] [Thread 0x270e4b90 (LWP 19801) exited] [Thread 0x2f8f5b90 (LWP 19814) exited] [Thread 0x320fab90 (LWP 19779) exited] [Thread 0x310f8b90 (LWP 19781) exited] [Thread 0x2e8f3b90 (LWP 19815) exited] [Thread 0x21affb90 (LWP 19825) exited] [Thread 0x300f6b90 (LWP 19808) exited] [Thread 0x328fbb90 (LWP 19778) exited] [Thread 0x2c8efb90 (LWP 19790) exited] [Thread 0x2d0f0b90 (LWP 19789) exited] [Thread 0x260e2b90 (LWP 19817) exited] [Thread 0x268e3b90 (LWP 19802) exited] [Thread 0x240deb90 (LWP 19821) exited] [Thread 0x290e8b90 (LWP 19797) exited] [Thread 0x280e6b90 (LWP 19799) exited] [Thread 0x2a8ebb90 (LWP 19794) exited] [Thread 0x20afdb90 (LWP 19829) exited] [Thread 0x2a0eab90 (LWP 19795) exited] [Thread 0x2d8f1b90 (LWP 19788) exited] [Thread 0x248dfb90 (LWP 19820) exited] [Thread 0x2b8edb90 (LWP 19816) exited] [Thread 0x278e5b90 (LWP 19800) exited] [Thread 0x2b0ecb90 (LWP 19793) exited] [Thread 0x2f0f4b90 (LWP 19785) exited] [Thread 0x298e9b90 (LWP 19796) exited] [Thread 0x35a2db90 (LWP 19772) exited] [Thread 0x318f9b90 (LWP 19780) exited] [Thread 0x236ffb90 (LWP 19822) exited] [Thread 0x258e1b90 (LWP 19818) exited] [Thread 0x348ffb90 (LWP 19774) exited] [Thread 0x308f7b90 (LWP 19813) exited] [Thread 0x288e7b90 (LWP 19798) exited] [Thread 0x202fcb90 (LWP 19830) exited] [Thread 0x340feb90 (LWP 19775) exited] [Thread 0x3522cb90 (LWP 19773) exited] [Thread 0x330fcb90 (LWP 19777) exited] [Thread 0x1fafbb90 (LWP 19831) exited] [Thread 0x250e0b90 (LWP 19819) exited] [Thread 0x22cffb90 (LWP 19823) exited] [Thread 0x224feb90 (LWP 19824) exited] [Thread 0x212feb90 (LWP 19828) exited] (qemu) I'm going to try it with -no-kvm-pit now... ---------------------------------------------------------------------- Comment By: Teodor Milkov (z-image) Date: 2009-08-20 10:48 Message: I believe I may hit the same bug. * CPU is 2x 8 core + SMT (so it looks like 16 cores) Nehalem (Intel(R) Xeon(R) CPU E5520 @ 2.27GHz) * Host kernel is i386 and not x86_64: Debian sid package linux-image-2.6.30-1-686-bigmem 2.6.30-5 * QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88) * Guests: * Debian Etch with backports 32 bit kernel 2.6.26-bpo.2-686-bigmem * Debian Etch with custom compiled 32 bit kernel 2.6.30.4 Load testing with stress (http://weather.ou.edu/~apw/projects/stress/). Guests are configured to use 2047MB memory and 3 VCPUs (tried with 2VCPUs as well). After some time - anywhere from 30 minutes to several hours - the virtual machine hangs. It doesn't crash, just doesn't respond anymore to keyboard, vnc, ping or anything else. I tried to run a gdb session on the two guests and the results are more or less equal: gdb --args /usr/local/bin/qemu-system-x86_64 -S -M pc -m 2047 -smp 3 -name kvm2 -uuid 4f484293-7e31-2fb9-f2c8-246b5f87f301 -monitor pty -boot c -drive file=/var/lib/libvirt/images/iso/debian-40r8-etchnhalf-i386-netinst.iso,if=ide,media=cdrom,index=2 -drive file=/dev/vg0/kvm2,if=virtio,index=0,boot=on -net nic,macaddr=54:52:00:31:be:e3,vlan=0,model=virtio -net tap,fd=29,vlan=0 -serial pty -parallel none -usb -vnc 127.0.0.1:1 -k en-us GNU gdb (GDB) 6.8.50.20090628-cvs-debian ... ^C Program received signal SIGINT, Interrupt. 0xb8036424 in __kernel_vsyscall () (gdb) info threads 27 Thread 0xb7e10b90 (LWP 19064) 0xb8036424 in __kernel_vsyscall () 26 Thread 0xb760bb90 (LWP 19065) 0xb8036424 in __kernel_vsyscall () 25 Thread 0xb6e07b90 (LWP 19066) 0xb8036424 in __kernel_vsyscall () * 1 Thread 0xb7e11a70 (LWP 19060) 0xb8036424 in __kernel_vsyscall () (gdb) thread 1 [Switching to thread 1 (Thread 0xb7e11a70 (LWP 19060))]#0 0xb8036424 in __kernel_vsyscall () (gdb) bt #0 0xb8036424 in __kernel_vsyscall () #1 0xb7f06fe1 in select () from /lib/i686/cmov/libc.so.6 #2 0x0804c3c6 in qemu_select (max_fd=30, rfds=0xbfd46f00, wfds=0xbfd46e80, xfds=0xbfd46e00, tv=0xbfd46df4) at /home/zimage/kvm/qemu-kvm-devel-88/vl.c:313 #3 0x08052958 in main_loop_wait (timeout=1000) at /home/zimage/kvm/qemu-kvm-devel-88/vl.c:4339 #4 0x0818777e in kvm_main_loop () at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2194 #5 0x080530c9 in main_loop () at /home/zimage/kvm/qemu-kvm-devel-88/vl.c:4550 #6 0x08056799 in main (argc=33, argv=0xbfd47424, envp=0xbfd474ac) at /home/zimage/kvm/qemu-kvm-devel-88/vl.c:6416 (gdb) thread 25 [Switching to thread 25 (Thread 0xb6e07b90 (LWP 19066))]#0 0xb8036424 in __kernel_vsyscall () (gdb) bt #0 0xb8036424 in __kernel_vsyscall () #1 0xb7e59551 in sigtimedwait () from /lib/i686/cmov/libc.so.6 #2 0x08186e1b in kvm_main_loop_wait (env=0x9aad960, timeout=1000) at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:1869 #3 0x08187231 in kvm_main_loop_cpu (env=0x9aad960) at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2009 #4 0x08187340 in ap_main_loop (_env=0x9aad960) at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2044 #5 0xb7fd74b5 in start_thread () from /lib/i686/cmov/libpthread.so.0 #6 0xb7f0ea5e in clone () from /lib/i686/cmov/libc.so.6 (gdb) thread 26 [Switching to thread 26 (Thread 0xb760bb90 (LWP 19065))]#0 0xb8036424 in __kernel_vsyscall () (gdb) bt #0 0xb8036424 in __kernel_vsyscall () #1 0xb7e59551 in sigtimedwait () from /lib/i686/cmov/libc.so.6 #2 0x08186e1b in kvm_main_loop_wait (env=0x9aa4028, timeout=1000) at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:1869 #3 0x08187231 in kvm_main_loop_cpu (env=0x9aa4028) at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2009 #4 0x08187340 in ap_main_loop (_env=0x9aa4028) at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2044 #5 0xb7fd74b5 in start_thread () from /lib/i686/cmov/libpthread.so.0 #6 0xb7f0ea5e in clone () from /lib/i686/cmov/libc.so.6 (gdb) thread 27 [Switching to thread 27 (Thread 0xb7e10b90 (LWP 19064))]#0 0xb8036424 in __kernel_vsyscall () (gdb) bt #0 0xb8036424 in __kernel_vsyscall () #1 0xb7e59551 in sigtimedwait () from /lib/i686/cmov/libc.so.6 #2 0x08186e1b in kvm_main_loop_wait (env=0x9a93df0, timeout=1000) at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:1869 #3 0x08187231 in kvm_main_loop_cpu (env=0x9a93df0) at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2009 #4 0x08187340 in ap_main_loop (_env=0x9a93df0) at /home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2044 #5 0xb7fd74b5 in start_thread () from /lib/i686/cmov/libpthread.so.0 #6 0xb7f0ea5e in clone () from /lib/i686/cmov/libc.so.6 ---------------------------------------------------------------------- Comment By: Bryan Cameron Lesiuk (clesiuk) Date: 2009-03-25 17:35 Message: I have a similar problem as the original poster. I've discovered a possible workaround: disable CPU frequency scaling in the host: # apt-get remove powernowd I'm running with disabled frequency scaling and so far my system is stable. I set the host frequency manually: # cd /sys/devices/system/cpu/cpu0/cpufreq # cat scaling_available_frequencies > 2500000 2400000 2200000 2000000 1800000 1000000 # cat scaling_available_governors > conservative ondemand userspace powersave performance # echo powersave > scaling_governor (minimum frequency) # echo performance > scaling_governor (maximum frequency) Here's my rig: * AMD Athlon X2 4850e (2500 MHz dual core) * 4Gig memory, 800MHz, dual channel * 780G chipset (Jetway NC81-LF motherboard) I tried combinations of Host/Guest using: * Ubuntu 8.10 server, i686, KVM-72 * Ubuntu 8.10 server, amd64, KVM-72 * Ubuntu 9.04 server, amd64, KVM-84 (22 March 2009 beta) Stuff I've tried which had no discernible effect: * clock source: kvm-clock, acpi_pm * block device: ide, virtual * network device: e1000, virtual ---------------------------------------------------------------------- Comment By: Michael Tokarev (mjtsf) Date: 2009-02-09 13:52 Message: Ok, I have very similar issue here as well. Host - 4-core Phenom CPU and AMD 780G chipset, running 2.6.28.4-x86-64 (from kernel.org). kvm-83 32bits Guest - 2.6.27.13-i686smp, also from kernel.org. The guest is running with KVM_GUEST stuff enabled, using kvm timer and virtio network and block. The system is Debian (lenny-to-be) on both, but I don't think it matters since both uses custom-compiled kernels. Guest - at least one of them - hangs, especially when many guests are running in parallel (we've 4 windows machines and 4 linux machines, mostly idle). When it hangs, nothing really works - console, ping, etc. It usually continues working after 1..2 minutes or more. During the hang, the host is either silent or is spewing tons of "vcpu not ready for apic_round_robin" messages (several 1000s of them) -- but I can't be sure that message is directly related to the hangs. Nothing is logged on guest. The so-far-only-affected guest is assigned 2 virtual CPUs, -- I'll try to reboot it with single cpu only to see if it will change anything. I wasn't able to check gdb/trace/etc so far, because the guest that hangs is my main working machine, which is a terminal server, so I have to run to another room to server's console and check there. ---------------------------------------------------------------------- Comment By: Dustin Kirkland (dustin_kirkland) Date: 2009-02-09 12:38 Message: In the Ubuntu 8.10 guest, can you try the linux-image-virtual kernel? The current one points to linux-image-2.6.27-11-virtual. :-Dustin ---------------------------------------------------------------------- Comment By: Daniel Poelzleithner (poelzi) Date: 2009-01-18 06:18 Message: New stability infos on my side. Host: Linux dirus-dom 2.6.28-2-server #3-Ubuntu SMP Thu Dec 4 22:35:12 UTC 2008 x86_64 GNU/Linux Guest: 2.6.28 x86_64 - disabled all kvm guest options (with kvm_clock disabled) - enabled virtio_block - started with -smp 1 and -smp 2 they didn't crash yet, with 1 or 2 smp. I think disabling kvm guest support did the trick. however using nfs out of the guest is quite slow and not very stable it seems. the guest laggs quite often i have the feeling but even loads up to 11. running crashme, high -j kernel build and file transfers didn't crash the machine. ---------------------------------------------------------------------- Comment By: James Thomason (james_thomason) Date: 2009-01-15 07:30 Message: Update: I installed Ubuntu 8.10 server and upgraded to 2.6.29-rc1 and KVM-83. I am still able to reproduce when kvm -smp > 1. New behavior in this configuration is the printing of the message "Stuck??" to the console, followed shortly by a kernel panic. KVM Host: Ubuntu Server 8.10 Linux 2.6.29-RC1 KVM-83 KVM Guest: Ubuntu Server 8.10 2.6.27-9-server ---------------------------------------------------------------------- Comment By: James Thomason (james_thomason) Date: 2009-01-15 07:20 Message: Hello, I am able to reliably reproduce a condition where a guest goes into a tight loop or spinlock on all running cores. The scenario is exactly as described in bug 2351676, though my environment differs as detailed below. My observation is that the issue is correlated to the number of VCPUs assigned to the guest and CPU load. The higher the number of VCPUs and CPU utilization, the more easily it is triggered. If a KVM developer is interested in debugging live, I might be able to arrange getting the system in question into a DMZ. A review of the kvm tracker leads me to believe that the following bugs are possibly related: [ 2351676 ] Guests hang periodically on Ubuntu-8.10 [ 2353811 ] Solaris 10 guest unstable [ 2494730 ] Guests "stalling" on kvm-82 [ 2138079 ] kvm locks up system [ 2113643 ] guests AND host still getting stuck under CPU load KVM Host Configuration: 4 x Quad-Core AMD Opteron Processors (8346 HE @ 1.8Ghz) 64GB DDR2 667Mhz Fedora 10 x64 Kernel 2.6.28 KVM-82 KVM Guest Configuration: 32GB Memory 1 to 16 VCPUs Centos 5.2 x64 Kernel 2.6.28 IDE disk e1000 NIC ---------------------------------------------------------------------- Comment By: Daniel Poelzleithner (poelzi) Date: 2009-01-13 19:11 Message: I have a very simelar setup. Host: Ubuntu 8.10. Kernel 2.6.28-2-server KVM: 72, 80, 81, 82, 83 tried (using the up to date kvm module, too) Guests: Endian Firewall (centos based.) Kernel 2.6.22.19-72.endian15 Is stable so far. sometimes loos usb devices Ubuntu 8.10 Kernel 2.6.27, 2.6.28-2-server, 2.6.28 vanilla home brew Very unstable. As the Ubuntu 8.10 is also unstable when using the 2.6.28 vanilla kernel, i'm not so sure it's a guest problem. I will now compile a 2.6.28 kernel not having any kvm guest support. Things doesn't seem to have a affect: - using ide instead of virtio - using e1000 instead of virtio however, it seems that it may be caused by io access, but is not reproducable easily. Last tries i did': using kernel parameters "clocksource=acpi_pm notsc" in the guest. Still investigating if it makes the guest stable. btw. with kvm-82 i saw arround 100 io_exits when only the crashed ubuntu 8.10 is running. nothing else. ---------------------------------------------------------------------- Comment By: Chris Jones (c_jones) Date: 2008-12-10 20:29 Message: Actually, I was too quick to say that a Fedora 8 guest is stable. Even there, I'm seeing hangs once I get my application fully installed (basically, once I introduce some load). I also did an update to kvm-80 and the problem still exists (on all the guests I've tried). That's with kvm-80 kernel modules and the kvm-80 user, running on linux-2.6.27.8. Thanks, Chris ---------------------------------------------------------------------- Comment By: Chris Jones (c_jones) Date: 2008-12-01 19:09 Message: Alexey, Thanks for the response. As you advised, I tried a Fedora 8 guest, and it does seem to be much more stable. However, I really need a Debian base system for my application. Not necessarily Ubuntu 8.10, but I haven't had much luck with others either. Do you have any recommendations on one that is particularly stable? Over the weekend I tried: Fedora 8 : Seems very stable, but I really need a debian base. Ubuntu 8.04LTS : Same periodic hangs I was seeing on 8.10 Debian 4.0 Etch: Seems stable on the guest, but on the host, qemu process is running 100% busy while the guest is idle. Any chance you know a workaround for the issue I'm seeing on etch, or can recommend a Debian base distribution which works well with KVM? Thanks much, Chris ---------------------------------------------------------------------- Comment By: Technologov (technologov) Date: 2008-11-27 12:54 Message: In my opinion it is not the Ubuntu host that is problematic - but the guest on KVM. I mean that Ubuntu 8.10 guest is unstable on KVM. I have not found out why. If you try some better tested guest (Fedora 7/8 or Windows XP guest it should be lots more stable). And if you try some other host (i.e. Fedora host and run Ubuntu 8.10 guest it will be unstable). In short - in my opinion - the problem is not host OS, but either KVM or it's connection with guest OS. -Alexey E. "Technologov", 27.11.2008. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2351676&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html