[ kvm-Bugs-2351676 ] Guests hang periodically on Ubuntu-8.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bugs item #2351676, was opened at 2008-11-26 17:59
Message generated for change (Comment added) made by jlokier
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2351676&group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Chris Jones (c_jones)
Assigned to: Nobody/Anonymous (nobody)
Summary: Guests hang periodically on Ubuntu-8.10

Initial Comment:
I'm seeing periodic hangs on my guests.  I've been unable so far to find a trigger - they always boot fine, but after anywhere from 10 minutes to 24 hours they eventually hang completely.

My setup:
  * AMD Athlon X2 4850e (2500 MHz dual core)
  * 4Gig memory
  * Ubuntu 8.10 server, 64-bit
  * KVMs tried:
    : kvm-72 (shipped with ubuntu)
    : kvm-79 (built myself, --patched-kernel option)
  * Kernels tried:
    : 2.6.27.7 (kernel.org, self built)
    : 2.6.27-7-server from Ubuntu 8.10 distribution

  In guests
  * Ubuntu 8.10 server, 64-bit (virtual machine install)
  * kernel 2.6.27-7-server from Ubuntu 8.10

I'm running the guests like:
  sudo /usr/local/bin/qemu-system-x86_64        \
     -daemonize                                 \
     -no-kvm-irqchip                            \
     -hda Imgs/ndev_root.img                    \
     -m 1024                                    \
     -cdrom ISOs/ubuntu-8.10-server-amd64.iso   \
     -vnc :4                                    \
     -net nic,macaddr=DE:AD:BE:EF:04:04,model=e1000 \
     -net tap,ifname=tap4,script=/home/chris/kvm/qemu-ifup.sh 

The problem does not happen if I use -no-kvm.

I've tried some other options that have no effect:
  -no-kvm-pit
  -no-acpi

The disk images are raw format.

When the guests hang, I cannot ping them, and the vnc console us hung.  The qemu monitor is still accessible, and the guests recover if I issue a system_reset command from the monitor.  However, often, the console will not take keyboard after doing so.

When the guest is hung, kvm_stat shows all 0s for the counters:

efer_relo      exits  fpu_reloa  halt_exit  halt_wake  host_stat  hypercall
+insn_emul  insn_emul     invlpg   io_exits  irq_exits  irq_windo  largepage
+mmio_exit  mmu_cache  mmu_flood  mmu_pde_z  mmu_pte_u  mmu_pte_w  mmu_recyc
+mmu_shado  nmi_windo   pf_fixed   pf_guest  remote_tl  request_i  signal_ex
+tlb_flush
>          0          0          0          0          0          0          0
+0          0          0          0          0          0          0          0
+0          0          0          0          0          0          0          0
+0          0          0          0          0          0

gdb shows two threads - both waiting:

c(gdb) info threads
  2 Thread 0x414f1950 (LWP 422)  0x00007f36f07a03e1 in sigtimedwait ()
   from /lib/libc.so.6
  1 Thread 0x7f36f1f306e0 (LWP 414)  0x00007f36f084b482 in select ()
   from /lib/libc.so.6
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f36f1f306e0 (LWP 414))]#0  0x00007f36f084b482
+in select () from /lib/libc.so.6
(gdb) bt
#0  0x00007f36f084b482 in select () from /lib/libc.so.6
#1  0x00000000004094cb in main_loop_wait (timeout=0)
    at /home/chris/pkgs/kvm/kvm-79/qemu/vl.c:4719
#2  0x000000000050a7ea in kvm_main_loop ()
    at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:619
#3  0x000000000040fafc in main (argc=<value optimized out>,
    argv=0x7ffff9f41948) at /home/chris/pkgs/kvm/kvm-79/qemu/vl.c:4871
(gdb) thread 2
[Switching to thread 2 (Thread 0x414f1950 (LWP 422))]#0  0x00007f36f07a03e1 in
+sigtimedwait () from /lib/libc.so.6
(gdb) bt
#0  0x00007f36f07a03e1 in sigtimedwait () from /lib/libc.so.6
#1  0x000000000050a560 in kvm_main_loop_wait (env=0xc319e0, timeout=0)
    at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:284
#2  0x000000000050aaf7 in ap_main_loop (_env=<value optimized out>)
    at /home/chris/pkgs/kvm/kvm-79/qemu/qemu-kvm.c:425
#3  0x00007f36f11ba3ea in start_thread () from /lib/libpthread.so.0
#4  0x00007f36f0852c6d in clone () from /lib/libc.so.6
#5  0x0000000000000000 in ?? ()


Any clues to help me resolve this would be much appreciated.


----------------------------------------------------------------------

Comment By: Jamie Lokier (jlokier)
Date: 2009-09-09 19:07

Message:
Following up from my last comments: It turns out theI/O errors were due to
mundane permissions issues.  Qemu doesn't report attempts to write to a
read-only disk image to the guest as that, and I'd accidentally rendered an
image file read-only at the same time as starting to use the external
kvm-88 kernel modules.  Just by recreating it as root.

IDE (ata in the guest) gave I/O errors with obscure messages in the guest.
 SCSI and virtio-blk fared worse: I don't have any explanation of why both
of them resulted in a guest process seemingly perpetually stuck in
sync_page (the same one, too, not making progress), when trying to write to
that read-only image.

Secondly, since fixing the permissions problem and updating the host
kernel itself from 2.6.27-14 to 2.6.28-15 (Ubuntu 8.10 -> 9.04),
recompiling the external kvm-88 kernel modules against the new kernel and
using those modules, recompiling kvm-88 itself too (it might be affected by
new things in Glibc)... Since doing that, the exact same guest VM which was
getting stuck either after up to minutes (old kvm modules) or quickly with
virtio-blk using later kvm modules, seems to be stable.  The non-responsive
or partly responsive consoles and SSHs haven't occurred since the kernel
upgrade and kvm recompile, and that's without about 30 interactive
connections open at once to the guest, shifting a lot of data.

So my anecdotal advice is upgrade the kernel and kvm-88 looks pretty good
so far.

----------------------------------------------------------------------

Comment By: Jamie Lokier (jlokier)
Date: 2009-09-09 11:09

Message:
I should clarify that last comment, because some text went missing.

When I wrote about lockups "within 20 minutes", X non-responsive, SSH
non-responsive etc., that was using kvm-88 userspace, but the kvm modules
which shipped with the distro kernel.

When I was able to unload those modules (which had to wait until other
users weren't using KVM), I tried the "out of tree" compiled modules which
come with the kvm-88 source tree, and that's when I found I couldn't even
manage a simple bit of writing to one of my disks without SCSI SENSE errors
and I/O errors reaching the application (if using IDE), or with virto-blk I
got stuck processes very quickly, which have some similarity to the
stuckness that took much longer to arrive with the older modules.

----------------------------------------------------------------------

Comment By: Jamie Lokier (jlokier)
Date: 2009-09-09 11:05

Message:
Now I tried with the kvm-88 modules too, and it had one of these "freezes"
within seconds - as soon as I do "dd if=/dev/zero of=/dev/vda bs=512
count=1" to a virtio-blk device...  (Root device is IDE).  ps shows the dd
process stuck in sync_page.

SSH, console and X are still working though.  This isn't the same as
lockups using the older modules, but it has an interesting similarity: the
older modules and the current ones both get stuck in sync_page.

I tried the same with an IDE device instead of virtio-blk, and the "dd"
succeeded, but then an attempt to created a partition on the disk resulted
in lots of SCSI (ATA) I/O errors in the guest's kernel log and an I/O error
in the partitioning application.

Something is quite amiss.  It looks like either the "out of tree" modules
that come shipped with kvm-88 aren't as backward-compatible as I'd hoped,
or there was a latent bug (perhaps the reason for those firmer freezes
after many minutes) which the new modules expose much more quickly.

----------------------------------------------------------------------

Comment By: Jamie Lokier (jlokier)
Date: 2009-09-09 10:12

Message:
I'm seeing similar lockups, with kvm-88 on a quad 64-bit Xeon host which is
running Ubuntu Server 9.04, which is Ubuntu's 2.6.27-14-server kernel.

I've been doing some installs into VMs and each time it freezes a few
minutes in.  The console is still visible over VNC (I can disconnect and
reconnect, and get the picture back), but it doesn't respond to keypresses
_except_ it does respond to control-alt-Sysreq in the guest.  X is
similarly non-responsive to keyboard and mouse input.  SSH sessions hang,
but ping is still responsive.

As it's always the same guest kernel in my tests (I have a particular task
to do...) that might be relevant.  It's Ubuntu's i386 (32-bit)
linux-image-2.6.28-15-generic.

Suggestions to pin the host clock frequency aren't helping: this thing
doesn't seem to have cpufreq at all.

As this seems to happen mostly when I have lots of disk activity (software
installs), I've tried all of virtio-blk, scsi and ide block drivers; the
ide one seemed to survive longer, but still failed within 20 minutes.  I've
tried increasing the memory of the guest substantially (in case it's a
guest bug driven by the amount of I/O), and I've tried "nolapic" in the
guest kernel's command line to disable using the lapic as clock source (due
to suggestions that it may be related to reading the time).  The guest
doesn't show any clock sources other than pit and lapic in
/proc/timer_list.

z-image, when you say 2.6.31 appears to fix it, do you mean 2.6.31 as the
host or as the guest?


----------------------------------------------------------------------

Comment By: Teodor Milkov (z-image)
Date: 2009-08-24 08:45

Message:
With 2.6.31-rc6 it is running fine for almost 72 hours. Looks like the
problem is gone in 2.6.31.

----------------------------------------------------------------------

Comment By: Teodor Milkov (z-image)
Date: 2009-08-21 09:53

Message:
With -no-kvm-pit it is running fine for almost 20 hours. Didn't survive
that long without -no-kvm-pit.

----------------------------------------------------------------------

Comment By: Daniel Poelzleithner (poelzi)
Date: 2009-08-20 16:20

Message:
I'm still in investigation but I got new informations so far. There seem to
be diffenerent issues that cause different crashes.

- dynamic cpu throtteling on the host
- oops due the paravirt kvm support in the guest. i got hit by
http://bugzilla.kernel.org/show_bug.cgi?id=12405 and I'm now investigation
if disableing highmem helps as someone suggested. don't know if this also
affects 64bit guests, which seems to run more stable on other machines
here.

it helps to setup netconsole and let syslog-ng write it to a log file, so
oopses can be logged nicely

----------------------------------------------------------------------

Comment By: Teodor Milkov (z-image)
Date: 2009-08-20 14:12

Message:
On a closer look qemu actually exited, but it was virt manager who held
it's monitoring console. Here's full transcript of what happened in a shell
session:

gdb --args /usr/local/bin/qemu-system-x86_64 -S -M pc -m 2047 -smp 3 -name
kvm2 -uuid 4f484293-7e31-2fb9-f2c8-246b5f87f301 -monitor stdio -boot c
-drive file=/dev/vg0/kvm2,if=virtio,index=0,boot=on -serial none -parallel
none -vnc 213.145.98.164:1 -k en-us

GNU gdb (GDB) 6.8.50.20090628-cvs-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>

(gdb) run
Starting program: /usr/local/bin/qemu-system-x86_64 -S -M pc -m 2047 -smp
3 -name kvm2 -uuid 4f484293-7e31-2fb9-f2c8-246b5f87f301 -monitor stdio
-boot c -drive file=/dev/vg0/kvm2,if=virtio,index=0,boot=on -serial none
-parallel none -vnc 213.145.98.164:1 -k en-us
[Thread debugging using libthread_db enabled]
[New Thread 0xb7dafb90 (LWP 19769)]
[New Thread 0xb75aab90 (LWP 19770)]
[New Thread 0xb6da6b90 (LWP 19771)]

QEMU 0.10.50 monitor - type 'help' for more information
(qemu) c

[New Thread 0x35a2db90 (LWP 19772)]
[New Thread 0x3522cb90 (LWP 19773)]
[New Thread 0x348ffb90 (LWP 19774)]
[New Thread 0x340feb90 (LWP 19775)]
[New Thread 0x338fdb90 (LWP 19776)]
[New Thread 0x330fcb90 (LWP 19777)]
[New Thread 0x328fbb90 (LWP 19778)]
[New Thread 0x320fab90 (LWP 19779)]
[New Thread 0x318f9b90 (LWP 19780)]
[New Thread 0x310f8b90 (LWP 19781)]
[New Thread 0x308f7b90 (LWP 19782)]
[New Thread 0x300f6b90 (LWP 19783)]
[New Thread 0x2f8f5b90 (LWP 19784)]
[New Thread 0x2f0f4b90 (LWP 19785)]
[New Thread 0x2e8f3b90 (LWP 19786)]
[New Thread 0x2e0f2b90 (LWP 19787)]
[New Thread 0x2d8f1b90 (LWP 19788)]
[New Thread 0x2d0f0b90 (LWP 19789)]
[New Thread 0x2c8efb90 (LWP 19790)]
[New Thread 0x2c0eeb90 (LWP 19791)]
[New Thread 0x2b8edb90 (LWP 19792)]
[New Thread 0x2b0ecb90 (LWP 19793)]
[New Thread 0x2a8ebb90 (LWP 19794)]
[New Thread 0x2a0eab90 (LWP 19795)]
[New Thread 0x298e9b90 (LWP 19796)]
[New Thread 0x290e8b90 (LWP 19797)]
[New Thread 0x288e7b90 (LWP 19798)]
[New Thread 0x280e6b90 (LWP 19799)]
[New Thread 0x278e5b90 (LWP 19800)]
[New Thread 0x270e4b90 (LWP 19801)]
[New Thread 0x268e3b90 (LWP 19802)]
[Thread 0x2e0f2b90 (LWP 19787) exited]
[Thread 0x338fdb90 (LWP 19776) exited]
[Thread 0x2b8edb90 (LWP 19792) exited]
[Thread 0x2e8f3b90 (LWP 19786) exited]
[Thread 0x2f8f5b90 (LWP 19784) exited]
[Thread 0x308f7b90 (LWP 19782) exited]
[Thread 0x300f6b90 (LWP 19783) exited]
[New Thread 0x300f6b90 (LWP 19808)]
[New Thread 0x308f7b90 (LWP 19813)]
[New Thread 0x2f8f5b90 (LWP 19814)]
[New Thread 0x2e8f3b90 (LWP 19815)]
[New Thread 0x2b8edb90 (LWP 19816)]
[New Thread 0x260e2b90 (LWP 19817)]
[New Thread 0x258e1b90 (LWP 19818)]
[New Thread 0x250e0b90 (LWP 19819)]
[New Thread 0x248dfb90 (LWP 19820)]
[New Thread 0x240deb90 (LWP 19821)]
[New Thread 0x236ffb90 (LWP 19822)]
[New Thread 0x22cffb90 (LWP 19823)]
[New Thread 0x224feb90 (LWP 19824)]
[New Thread 0x21affb90 (LWP 19825)]
[New Thread 0x212feb90 (LWP 19828)]
[New Thread 0x20afdb90 (LWP 19829)]
[New Thread 0x202fcb90 (LWP 19830)]
[New Thread 0x1fafbb90 (LWP 19831)]
kvm: unhandled exit 31
kvm_run returned -22
[Thread 0x2c0eeb90 (LWP 19791) exited]
[Thread 0x270e4b90 (LWP 19801) exited]
[Thread 0x2f8f5b90 (LWP 19814) exited]
[Thread 0x320fab90 (LWP 19779) exited]
[Thread 0x310f8b90 (LWP 19781) exited]
[Thread 0x2e8f3b90 (LWP 19815) exited]
[Thread 0x21affb90 (LWP 19825) exited]
[Thread 0x300f6b90 (LWP 19808) exited]
[Thread 0x328fbb90 (LWP 19778) exited]
[Thread 0x2c8efb90 (LWP 19790) exited]
[Thread 0x2d0f0b90 (LWP 19789) exited]
[Thread 0x260e2b90 (LWP 19817) exited]
[Thread 0x268e3b90 (LWP 19802) exited]
[Thread 0x240deb90 (LWP 19821) exited]
[Thread 0x290e8b90 (LWP 19797) exited]
[Thread 0x280e6b90 (LWP 19799) exited]
[Thread 0x2a8ebb90 (LWP 19794) exited]
[Thread 0x20afdb90 (LWP 19829) exited]
[Thread 0x2a0eab90 (LWP 19795) exited]
[Thread 0x2d8f1b90 (LWP 19788) exited]
[Thread 0x248dfb90 (LWP 19820) exited]
[Thread 0x2b8edb90 (LWP 19816) exited]
[Thread 0x278e5b90 (LWP 19800) exited]
[Thread 0x2b0ecb90 (LWP 19793) exited]
[Thread 0x2f0f4b90 (LWP 19785) exited]
[Thread 0x298e9b90 (LWP 19796) exited]
[Thread 0x35a2db90 (LWP 19772) exited]
[Thread 0x318f9b90 (LWP 19780) exited]
[Thread 0x236ffb90 (LWP 19822) exited]
[Thread 0x258e1b90 (LWP 19818) exited]
[Thread 0x348ffb90 (LWP 19774) exited]
[Thread 0x308f7b90 (LWP 19813) exited]
[Thread 0x288e7b90 (LWP 19798) exited]
[Thread 0x202fcb90 (LWP 19830) exited]
[Thread 0x340feb90 (LWP 19775) exited]
[Thread 0x3522cb90 (LWP 19773) exited]
[Thread 0x330fcb90 (LWP 19777) exited]
[Thread 0x1fafbb90 (LWP 19831) exited]
[Thread 0x250e0b90 (LWP 19819) exited]
[Thread 0x22cffb90 (LWP 19823) exited]
[Thread 0x224feb90 (LWP 19824) exited]
[Thread 0x212feb90 (LWP 19828) exited]

(qemu)

I'm going to try it with -no-kvm-pit now...

----------------------------------------------------------------------

Comment By: Teodor Milkov (z-image)
Date: 2009-08-20 10:48

Message:
I believe I may hit the same bug.

* CPU is 2x 8 core + SMT (so it looks like 16 cores) Nehalem (Intel(R)
Xeon(R) CPU E5520  @ 2.27GHz)
* Host kernel is i386 and not x86_64: Debian sid package
linux-image-2.6.30-1-686-bigmem 2.6.30-5
* QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)
* Guests:
   * Debian Etch with backports 32 bit kernel 2.6.26-bpo.2-686-bigmem
   * Debian Etch with custom compiled 32 bit kernel 2.6.30.4

Load testing with stress (http://weather.ou.edu/~apw/projects/stress/).
Guests are configured to use 2047MB memory and 3 VCPUs (tried with 2VCPUs
as well).

After some time - anywhere from 30 minutes to several hours - the virtual
machine hangs. It doesn't crash, just doesn't respond anymore to keyboard,
vnc, ping or anything else. I tried to run a gdb session on the two guests
and the results are more or less equal:

gdb --args /usr/local/bin/qemu-system-x86_64 -S -M pc -m 2047 -smp 3 -name
kvm2 -uuid 4f484293-7e31-2fb9-f2c8-246b5f87f301 -monitor pty -boot c -drive
file=/var/lib/libvirt/images/iso/debian-40r8-etchnhalf-i386-netinst.iso,if=ide,media=cdrom,index=2
-drive file=/dev/vg0/kvm2,if=virtio,index=0,boot=on -net
nic,macaddr=54:52:00:31:be:e3,vlan=0,model=virtio -net tap,fd=29,vlan=0
-serial pty -parallel none -usb -vnc 127.0.0.1:1 -k en-us    
GNU gdb (GDB) 6.8.50.20090628-cvs-debian

...

^C
Program received signal SIGINT, Interrupt.
0xb8036424 in __kernel_vsyscall ()

(gdb) info threads
  27 Thread 0xb7e10b90 (LWP 19064)  0xb8036424 in __kernel_vsyscall ()
  26 Thread 0xb760bb90 (LWP 19065)  0xb8036424 in __kernel_vsyscall ()
  25 Thread 0xb6e07b90 (LWP 19066)  0xb8036424 in __kernel_vsyscall ()
* 1 Thread 0xb7e11a70 (LWP 19060)  0xb8036424 in __kernel_vsyscall ()

(gdb) thread 1
[Switching to thread 1 (Thread 0xb7e11a70 (LWP 19060))]#0  0xb8036424 in
__kernel_vsyscall ()
(gdb) bt
#0  0xb8036424 in __kernel_vsyscall ()
#1  0xb7f06fe1 in select () from /lib/i686/cmov/libc.so.6
#2  0x0804c3c6 in qemu_select (max_fd=30, rfds=0xbfd46f00,
wfds=0xbfd46e80, xfds=0xbfd46e00, tv=0xbfd46df4) at
/home/zimage/kvm/qemu-kvm-devel-88/vl.c:313
#3  0x08052958 in main_loop_wait (timeout=1000) at
/home/zimage/kvm/qemu-kvm-devel-88/vl.c:4339
#4  0x0818777e in kvm_main_loop () at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2194
#5  0x080530c9 in main_loop () at
/home/zimage/kvm/qemu-kvm-devel-88/vl.c:4550
#6  0x08056799 in main (argc=33, argv=0xbfd47424, envp=0xbfd474ac) at
/home/zimage/kvm/qemu-kvm-devel-88/vl.c:6416

(gdb) thread 25
[Switching to thread 25 (Thread 0xb6e07b90 (LWP 19066))]#0  0xb8036424 in
__kernel_vsyscall ()
(gdb) bt
#0  0xb8036424 in __kernel_vsyscall ()
#1  0xb7e59551 in sigtimedwait () from /lib/i686/cmov/libc.so.6
#2  0x08186e1b in kvm_main_loop_wait (env=0x9aad960, timeout=1000) at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:1869
#3  0x08187231 in kvm_main_loop_cpu (env=0x9aad960) at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2009
#4  0x08187340 in ap_main_loop (_env=0x9aad960) at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2044
#5  0xb7fd74b5 in start_thread () from /lib/i686/cmov/libpthread.so.0
#6  0xb7f0ea5e in clone () from /lib/i686/cmov/libc.so.6

(gdb) thread 26
[Switching to thread 26 (Thread 0xb760bb90 (LWP 19065))]#0  0xb8036424 in
__kernel_vsyscall ()
(gdb) bt
#0  0xb8036424 in __kernel_vsyscall ()
#1  0xb7e59551 in sigtimedwait () from /lib/i686/cmov/libc.so.6
#2  0x08186e1b in kvm_main_loop_wait (env=0x9aa4028, timeout=1000) at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:1869
#3  0x08187231 in kvm_main_loop_cpu (env=0x9aa4028) at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2009
#4  0x08187340 in ap_main_loop (_env=0x9aa4028) at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2044
#5  0xb7fd74b5 in start_thread () from /lib/i686/cmov/libpthread.so.0
#6  0xb7f0ea5e in clone () from /lib/i686/cmov/libc.so.6

(gdb) thread 27
[Switching to thread 27 (Thread 0xb7e10b90 (LWP 19064))]#0  0xb8036424 in
__kernel_vsyscall ()
(gdb) bt
#0  0xb8036424 in __kernel_vsyscall ()
#1  0xb7e59551 in sigtimedwait () from /lib/i686/cmov/libc.so.6
#2  0x08186e1b in kvm_main_loop_wait (env=0x9a93df0, timeout=1000) at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:1869
#3  0x08187231 in kvm_main_loop_cpu (env=0x9a93df0) at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2009
#4  0x08187340 in ap_main_loop (_env=0x9a93df0) at
/home/zimage/kvm/qemu-kvm-devel-88/qemu-kvm.c:2044
#5  0xb7fd74b5 in start_thread () from /lib/i686/cmov/libpthread.so.0
#6  0xb7f0ea5e in clone () from /lib/i686/cmov/libc.so.6


----------------------------------------------------------------------

Comment By: Bryan Cameron Lesiuk (clesiuk)
Date: 2009-03-25 17:35

Message:
I have a similar problem as the original poster. 

I've discovered a possible workaround: disable CPU frequency scaling in
the host:
# apt-get remove powernowd

I'm running with disabled frequency scaling and so far my system is
stable.

I set the host frequency manually: 
# cd /sys/devices/system/cpu/cpu0/cpufreq
# cat scaling_available_frequencies
>     2500000 2400000 2200000 2000000 1800000 1000000 
# cat scaling_available_governors
>     conservative ondemand userspace powersave performance 
# echo powersave > scaling_governor    (minimum frequency)
# echo performance > scaling_governor  (maximum frequency)

Here's my rig: 
* AMD Athlon X2 4850e (2500 MHz dual core)
* 4Gig memory, 800MHz, dual channel
* 780G chipset (Jetway NC81-LF motherboard)

I tried combinations of Host/Guest using:
* Ubuntu 8.10 server, i686, KVM-72 
* Ubuntu 8.10 server, amd64, KVM-72
* Ubuntu 9.04 server, amd64, KVM-84 (22 March 2009 beta)

Stuff I've tried which had no discernible effect: 
* clock source: kvm-clock, acpi_pm
* block device: ide, virtual
* network device: e1000, virtual

----------------------------------------------------------------------

Comment By: Michael Tokarev (mjtsf)
Date: 2009-02-09 13:52

Message:
Ok, I have very similar issue here as well.
Host - 4-core Phenom CPU and AMD 780G chipset, running 2.6.28.4-x86-64
(from kernel.org).
kvm-83 32bits
Guest - 2.6.27.13-i686smp, also from kernel.org.

The guest is running with KVM_GUEST stuff enabled, using kvm timer and
virtio network and block.  The system is Debian (lenny-to-be) on both, but
I don't think it matters since both uses custom-compiled kernels.

Guest - at least one of them - hangs, especially when many guests are
running in parallel (we've 4 windows machines and 4 linux machines, mostly
idle).  When it hangs, nothing really works - console, ping, etc.  It
usually continues working after 1..2 minutes or more.  During the hang, the
host is either silent or is spewing tons of "vcpu not ready for
apic_round_robin" messages (several 1000s of them) -- but I can't be sure
that message is directly related to the hangs.

Nothing is logged on guest.

The so-far-only-affected guest is assigned 2 virtual CPUs, -- I'll try to
reboot it with single cpu only to see if it will change anything.

I wasn't able to check gdb/trace/etc so far, because the guest that hangs
is my main working machine, which is a terminal server, so I have to run to
another room to server's console and check there.

----------------------------------------------------------------------

Comment By: Dustin Kirkland (dustin_kirkland)
Date: 2009-02-09 12:38

Message:
In the Ubuntu 8.10 guest, can you try the linux-image-virtual kernel?  The
current one points to linux-image-2.6.27-11-virtual.

:-Dustin

----------------------------------------------------------------------

Comment By: Daniel Poelzleithner (poelzi)
Date: 2009-01-18 06:18

Message:
New stability infos on my side.

Host:
Linux dirus-dom 2.6.28-2-server #3-Ubuntu SMP Thu Dec 4 22:35:12 UTC 2008
x86_64 GNU/Linux


Guest:
2.6.28 x86_64 
- disabled all kvm guest options (with kvm_clock disabled)
- enabled virtio_block 
- started with -smp 1 and -smp 2

they didn't crash yet, with 1 or 2 smp. I think disabling kvm guest
support did the trick.
however using nfs out of the guest is quite slow and not very stable it
seems. the guest laggs quite often
i have the feeling but even loads up to 11. running crashme, high -j
kernel build and file transfers didn't crash the machine.

----------------------------------------------------------------------

Comment By: James Thomason (james_thomason)
Date: 2009-01-15 07:30

Message:
Update: 

I installed Ubuntu 8.10 server and upgraded to 2.6.29-rc1 and KVM-83. I am
still able to reproduce when kvm -smp > 1.  New behavior in this
configuration is the printing of the message "Stuck??" to the console,
followed shortly by a kernel panic.   

KVM Host:

Ubuntu Server 8.10
Linux 2.6.29-RC1
KVM-83 

KVM Guest: 

Ubuntu Server 8.10
2.6.27-9-server



----------------------------------------------------------------------

Comment By: James Thomason (james_thomason)
Date: 2009-01-15 07:20

Message:
Hello, 

I am able to reliably reproduce a condition where a guest goes into a
tight
loop or spinlock on all running cores.  The scenario is exactly as
described
in bug 2351676, though my environment differs as detailed below.  My
observation is that the issue is correlated to the number of VCPUs
assigned
to the guest and CPU load. The higher the number of VCPUs and CPU
utilization, the more easily it is triggered.  If a KVM developer is
interested in debugging live, I might be able to arrange getting the
system
in question into a DMZ.  A review of the kvm tracker leads me to believe
that the following bugs are possibly related:

[ 2351676 ] Guests hang periodically on Ubuntu-8.10
[ 2353811 ] Solaris 10 guest unstable
[ 2494730 ] Guests "stalling" on kvm-82
[ 2138079 ] kvm locks up system
[ 2113643 ] guests AND host still getting stuck under CPU load

KVM Host Configuration:

4 x Quad-Core AMD Opteron Processors (8346 HE @ 1.8Ghz)
64GB DDR2 667Mhz
Fedora 10 x64
Kernel 2.6.28
KVM-82 

KVM Guest Configuration:
32GB Memory
1 to 16 VCPUs
Centos 5.2 x64
Kernel 2.6.28
IDE disk
e1000 NIC

----------------------------------------------------------------------

Comment By: Daniel Poelzleithner (poelzi)
Date: 2009-01-13 19:11

Message:
I have a very simelar setup.

Host: 
Ubuntu 8.10. 
Kernel 2.6.28-2-server
KVM: 72, 80, 81, 82, 83 tried (using the up to date kvm module, too)

Guests:
Endian Firewall (centos based.) 
Kernel 2.6.22.19-72.endian15
Is stable so far. sometimes loos usb devices

Ubuntu 8.10
Kernel 2.6.27, 2.6.28-2-server, 2.6.28 vanilla home brew
Very unstable.

As the Ubuntu 8.10 is also unstable when using the 2.6.28 vanilla kernel,
i'm not so sure it's a guest problem.
I will now compile a 2.6.28 kernel not having any kvm guest support.

Things doesn't seem to have a affect:
- using ide instead of virtio
- using e1000 instead of virtio

however, it seems that it may be caused by io access, but is not
reproducable easily.

Last tries i did': using kernel parameters "clocksource=acpi_pm notsc" in
the guest. Still investigating if it makes the guest stable.

btw. with kvm-82 i saw arround 100 io_exits when only the crashed ubuntu
8.10 is running. nothing else.

----------------------------------------------------------------------

Comment By: Chris Jones (c_jones)
Date: 2008-12-10 20:29

Message:
Actually, I was too quick to say that a Fedora 8 guest is stable.  Even
there, I'm seeing hangs once I get my application fully installed
(basically, once I introduce some load).

I also did an update to kvm-80 and the problem still exists (on all the
guests I've tried).  That's with kvm-80 kernel modules and the kvm-80 user,
running on linux-2.6.27.8.

Thanks,
Chris

----------------------------------------------------------------------

Comment By: Chris Jones (c_jones)
Date: 2008-12-01 19:09

Message:
Alexey,

Thanks for the response.  As you advised, I tried a Fedora 8 guest, and it
does seem to be much more stable.  However, I really need a Debian base
system for my application.  Not necessarily Ubuntu 8.10, but I haven't had
much luck with others either.  Do you have any recommendations on one that
is particularly stable?

Over the weekend I tried:
  Fedora 8       : Seems very stable, but I really need a debian base.
  Ubuntu 8.04LTS : Same periodic hangs I was seeing on 8.10
  Debian 4.0 Etch: Seems stable on the guest, but on the host, qemu
process is running 100% busy
                   while the guest is idle.

Any chance you know a workaround for the issue I'm seeing on etch, or can
recommend a Debian base distribution which works well with KVM?

Thanks much,
Chris

----------------------------------------------------------------------

Comment By: Technologov (technologov)
Date: 2008-11-27 12:54

Message:
In my opinion it is not the Ubuntu host that is problematic - but the guest
on KVM.

I mean that Ubuntu 8.10 guest is unstable on KVM. I have not found out
why.

If you try some better tested guest (Fedora 7/8 or Windows XP guest it
should be lots more stable).

And if you try some other host (i.e. Fedora host and run Ubuntu 8.10 guest
it will be unstable).

In short - in my opinion - the problem is not host OS, but either KVM or
it's connection with guest OS.

-Alexey E. "Technologov", 27.11.2008.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2351676&group_id=180599
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux