On Wed, Mar 2, 2011 at 10:39 AM, ya su <suya94335@xxxxxxxxx> wrote: > io_thread bt as the following: > #0 0x00007f3086eaa034 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x00007f3086ea5345 in _L_lock_870 () from /lib64/libpthread.so.0 > #2 0x00007f3086ea5217 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x0000000000436018 in kvm_mutex_lock () at > /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1730 > #4 qemu_mutex_lock_iothread () at > /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1744 > #5 0x000000000041ca67 in main_loop_wait (nonblocking=<value optimized out>) > at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1377 > #6 0x00000000004363e7 in kvm_main_loop () at > /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1589 > #7 0x000000000041dc3a in main_loop (argc=<value optimized out>, > argv=<value optimized out>, > envp=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1429 > #8 main (argc=<value optimized out>, argv=<value optimized out>, > envp=<value optimized out>) > at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:3201 > > cpu thread as the following: > #0 0x00007f3084dff093 in select () from /lib64/libc.so.6 > #1 0x00000000004453ea in qemu_aio_wait () at aio.c:193 > #2 0x0000000000444175 in bdrv_write_em (bs=0x1ec3090, sector_num=2009871, > buf=0x7f3087532800 > "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b", > nb_sectors=16) at block.c:2577 > #3 0x000000000059ca13 in ide_sector_write (s=0x215f508) at > /root/rpmbuild/BUILD/qemu-kvm-0.14/hw/ide/core.c:574 > #4 0x0000000000438ced in kvm_handle_io (env=0x202ef60) at > /root/rpmbuild/BUILD/qemu-kvm-0.14/kvm-all.c:821 > #5 kvm_run (env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:617 > #6 0x0000000000438e09 in kvm_cpu_exec (env=<value optimized out>) > at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1233 > #7 0x000000000043a0f7 in kvm_main_loop_cpu (_env=0x202ef60) > at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1419 > #8 ap_main_loop (_env=0x202ef60) at > /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1466 > #9 0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0 > #10 0x00007f3084e0653d in clone () from /lib64/libc.so.6 > > aio_thread bt as the following: > #0 0x00007f3086eaae83 in pwrite64 () from /lib64/libpthread.so.0 > #1 0x0000000000447501 in handle_aiocb_rw_linear (aiocb=0x21cff10, > buf=0x7f3087532800 > "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b") > at posix-aio-compat.c:212 > #2 0x0000000000447d48 in handle_aiocb_rw (unused=<value optimized > out>) at posix-aio-compat.c:247 > #3 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:341 > #4 0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0 > #5 0x00007f3084e0653d in clone () from /lib64/libc.so.6 > > I think io_thread is blocked by cpu thread which take the qemu_mutux > first, cpu thread is waiting for aio_thread's result by qemu_aio_wait > function, aio_thead take much time on pwrite64, it will take about > 5-10s, then return a error(it seems like an non-block timeout call), > after that, io thead will have a chance to receive monitor input, so > the monitor seems to blocked frequently. in this suition, if I stop > the vm, the monitor will response faster. > > the problem is caused by unavailabity of block layer, the block layer > process the io error in a normal way, it report error to ide device, > the error is handled in ide_sector_write. the root cause is: monitor's > input and io operation(pwrite function) must execute in a serialized > method(by qemu_mutux seamphore), so pwrite long block time will hinder > monitor input. > > as stefan says, it seems difficult to take monitor input out of the > protection, currently I will stop the vm if the disk image can not be > reached. If you switch to -drive if=virtio instead of IDE then the problem should be greatly reduced. Virtio-blk uses aio instead of synchronous calls, which means that the vcpu thread does not run qemu_aio_wait(). Kevin and I have been looking into the limitations imposed by synchronous calls. Today there is unfortunately synchronous code in QEMU and we can hit these NFS hang situations. qemu_aio_wait() runs a nested event loop that does a subset of what the full event loop does. This is why the monitor does not respond. If all code was asynchronous then only a top-level event loop would be necessary and the monitor would continue to function. In the immediate term I suggest using virtio-blk instead of IDE. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html