Re: problem about blocked monitor when disk image on NFS can not be reached.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 2, 2011 at 10:39 AM, ya su <suya94335@xxxxxxxxx> wrote:
> io_thread bt as the following:
> #0  0x00007f3086eaa034 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f3086ea5345 in _L_lock_870 () from /lib64/libpthread.so.0
> #2  0x00007f3086ea5217 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000000436018 in kvm_mutex_lock () at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1730
> #4  qemu_mutex_lock_iothread () at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1744
> #5  0x000000000041ca67 in main_loop_wait (nonblocking=<value optimized out>)
>    at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1377
> #6  0x00000000004363e7 in kvm_main_loop () at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1589
> #7  0x000000000041dc3a in main_loop (argc=<value optimized out>,
> argv=<value optimized out>,
>    envp=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1429
> #8  main (argc=<value optimized out>, argv=<value optimized out>,
> envp=<value optimized out>)
>    at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:3201
>
> cpu thread as the following:
> #0  0x00007f3084dff093 in select () from /lib64/libc.so.6
> #1  0x00000000004453ea in qemu_aio_wait () at aio.c:193
> #2  0x0000000000444175 in bdrv_write_em (bs=0x1ec3090, sector_num=2009871,
>    buf=0x7f3087532800
> "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b",
> nb_sectors=16) at block.c:2577
> #3  0x000000000059ca13 in ide_sector_write (s=0x215f508) at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/hw/ide/core.c:574
> #4  0x0000000000438ced in kvm_handle_io (env=0x202ef60) at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/kvm-all.c:821
> #5  kvm_run (env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:617
> #6  0x0000000000438e09 in kvm_cpu_exec (env=<value optimized out>)
>    at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1233
> #7  0x000000000043a0f7 in kvm_main_loop_cpu (_env=0x202ef60)
>    at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1419
> #8  ap_main_loop (_env=0x202ef60) at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1466
> #9  0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007f3084e0653d in clone () from /lib64/libc.so.6
>
> aio_thread bt as the following:
> #0  0x00007f3086eaae83 in pwrite64 () from /lib64/libpthread.so.0
> #1  0x0000000000447501 in handle_aiocb_rw_linear (aiocb=0x21cff10,
>    buf=0x7f3087532800
> "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b")
> at posix-aio-compat.c:212
> #2  0x0000000000447d48 in handle_aiocb_rw (unused=<value optimized
> out>) at posix-aio-compat.c:247
> #3  aio_thread (unused=<value optimized out>) at posix-aio-compat.c:341
> #4  0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0
> #5  0x00007f3084e0653d in clone () from /lib64/libc.so.6
>
> I think io_thread is blocked by cpu thread which take the qemu_mutux
> first, cpu thread is waiting for aio_thread's result by qemu_aio_wait
> function,  aio_thead take much time on pwrite64, it will take about
> 5-10s, then return a error(it seems like an non-block timeout call),
> after that, io thead will have a chance to receive monitor input, so
> the monitor seems to blocked frequently. in this suition, if I stop
> the vm, the monitor will response faster.
>
> the problem is caused by unavailabity of block layer, the block layer
> process the io error in a normal way, it report error to ide device,
> the error is handled in ide_sector_write. the root cause is: monitor's
> input and io operation(pwrite function) must execute in a serialized
> method(by qemu_mutux seamphore), so pwrite long block time will hinder
> monitor input.
>
> as stefan says, it seems difficult to take monitor input out of the
> protection, currently I will stop the vm if the disk image can not be
> reached.

If you switch to -drive if=virtio instead of IDE then the problem
should be greatly reduced.  Virtio-blk uses aio instead of synchronous
calls, which means that the vcpu thread does not run qemu_aio_wait().

Kevin and I have been looking into the limitations imposed by
synchronous calls.  Today there is unfortunately synchronous code in
QEMU and we can hit these NFS hang situations.  qemu_aio_wait() runs a
nested event loop that does a subset of what the full event loop does.
 This is why the monitor does not respond.

If all code was asynchronous then only a top-level event loop would be
necessary and the monitor would continue to function.

In the immediate term I suggest using virtio-blk instead of IDE.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux