Hi Peter, Am 12.05.2010 16:01, schrieb Peter Lieven: > Hi Kevin, > > here we go. I created a blocking multipath device (interrupted all > paths). qemu-kvm hangs with 100% cpu. > also monitor is not responding. > > If I restore at least one path, the vm is continueing. > > BR, > Peter This seems to be the backtrace of only one thread, and likely not the interesting one. Can you please use "threads all apply bt" to get the backtrace of all threads? Kevin > > > ^C > Program received signal SIGINT, Interrupt. > 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 > (gdb) bt > #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 > #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 > #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 > #3 0x000000000042e739 in kvm_mutex_lock () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 > #4 0x000000000042e76e in qemu_mutex_lock_iothread () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 > #5 0x000000000040c262 in main_loop_wait (timeout=1000) at > /usr/src/qemu-kvm-0.12.4/vl.c:3995 > #6 0x000000000042dcf1 in kvm_main_loop () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 > #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 > #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, > envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 > (gdb) bt full > #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 > No symbol table info available. > #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 > No symbol table info available. > #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 > No symbol table info available. > #3 0x000000000042e739 in kvm_mutex_lock () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 > No locals. > #4 0x000000000042e76e in qemu_mutex_lock_iothread () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 > No locals. > #5 0x000000000040c262 in main_loop_wait (timeout=1000) at > /usr/src/qemu-kvm-0.12.4/vl.c:3995 > ioh = (IOHandlerRecord *) 0x0 > rfds = {fds_bits = {1048576, 0 <repeats 15 times>}} > wfds = {fds_bits = {0 <repeats 16 times>}} > xfds = {fds_bits = {0 <repeats 16 times>}} > ret = 1 > nfds = 21 > tv = {tv_sec = 0, tv_usec = 999761} > #6 0x000000000042dcf1 in kvm_main_loop () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 > fds = {18, 19} > mask = {__val = {268443712, 0 <repeats 15 times>}} > sigfd = 20 > #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 > r = 0 > #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, > envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 > gdbstub_dev = 0x0 > boot_devices_bitmap = 12 > i = 0 > snapshot = 0 > linux_boot = 0 > initrd_filename = 0x0 > kernel_filename = 0x0 > kernel_cmdline = 0x588fac "" > boot_devices = "dc", '\0' <repeats 30 times> > ds = (DisplayState *) 0x198bf00 > dcl = (DisplayChangeListener *) 0x0 > cyls = 0 > heads = 0 > secs = 0 > translation = 0 > hda_opts = (QemuOpts *) 0x0 > opts = (QemuOpts *) 0x1957390 > optind = 30 > ---Type <return> to continue, or q <return> to quit--- > r = 0x7fff266a8a23 "-usbdevice" > optarg = 0x7fff266a8a2e "tablet" > loadvm = 0x0 > machine = (QEMUMachine *) 0x861720 > cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' > ' <repeats 11 times>, "E5520 @ 2.27GHz" > fds = {644511720, 32767} > tb_size = 0 > pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid" > incoming = 0x0 > fd = 0 > pwd = (struct passwd *) 0x0 > chroot_dir = 0x0 > run_as = 0x0 > env = (struct CPUX86State *) 0x0 > show_vnc_port = 0 > params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0} > > Kevin Wolf wrote: >> Am 04.05.2010 15:42, schrieb Peter Lieven: >> >>> hi kevin, >>> >>> you did it *g* >>> >>> looks promising. applied this patched and was not able to reproduce yet :-) >>> >>> secure way to reproduce was to shut down all multipath paths, then >>> initiate i/o >>> in the vm (e.g. start an application). of course, everything hangs at >>> this point. >>> >>> after reenabling one path, vm crashed. now it seems to behave correctly and >>> just report an DMA timeout and continues normally afterwards. >>> >> >> Great, I'm going to submit it as a proper patch then. >> >> Christoph, by now I'm pretty sure it's right, but can you have another >> look if this is correct, anyway? >> >> >>> can you imagine of any way preventing the vm to consume 100% cpu in >>> that waiting state? >>> my current approach is to run all vms with nice 1, which helped to keep the >>> machine responsible if all vms (in my test case 64 on a box) have hanging >>> i/o at the same time. >>> >> >> I don't have anything particular in mind, but you could just attach gdb >> and get another backtrace while it consumes 100% CPU (you'll need to use >> "thread apply all bt" to catch everything). Then we should see where >> it's hanging. >> >> Kevin >> >> >> >> > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html