Re: problem about blocked monitor when disk image on NFS can not be reached.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi,all:

io_thread bt as the following:
#0  0x00007f3086eaa034 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f3086ea5345 in _L_lock_870 () from /lib64/libpthread.so.0
#2  0x00007f3086ea5217 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000436018 in kvm_mutex_lock () at
/root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1730
#4  qemu_mutex_lock_iothread () at
/root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1744
#5  0x000000000041ca67 in main_loop_wait (nonblocking=<value optimized out>)
    at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1377
#6  0x00000000004363e7 in kvm_main_loop () at
/root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1589
#7  0x000000000041dc3a in main_loop (argc=<value optimized out>,
argv=<value optimized out>,
    envp=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1429
#8  main (argc=<value optimized out>, argv=<value optimized out>,
envp=<value optimized out>)
    at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:3201

cpu thread as the following:
#0  0x00007f3084dff093 in select () from /lib64/libc.so.6
#1  0x00000000004453ea in qemu_aio_wait () at aio.c:193
#2  0x0000000000444175 in bdrv_write_em (bs=0x1ec3090, sector_num=2009871,
    buf=0x7f3087532800
"F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b",
nb_sectors=16) at block.c:2577
#3  0x000000000059ca13 in ide_sector_write (s=0x215f508) at
/root/rpmbuild/BUILD/qemu-kvm-0.14/hw/ide/core.c:574
#4  0x0000000000438ced in kvm_handle_io (env=0x202ef60) at
/root/rpmbuild/BUILD/qemu-kvm-0.14/kvm-all.c:821
#5  kvm_run (env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:617
#6  0x0000000000438e09 in kvm_cpu_exec (env=<value optimized out>)
    at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1233
#7  0x000000000043a0f7 in kvm_main_loop_cpu (_env=0x202ef60)
    at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1419
#8  ap_main_loop (_env=0x202ef60) at
/root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1466
#9  0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f3084e0653d in clone () from /lib64/libc.so.6

aio_thread bt as the following:
#0  0x00007f3086eaae83 in pwrite64 () from /lib64/libpthread.so.0
#1  0x0000000000447501 in handle_aiocb_rw_linear (aiocb=0x21cff10,
    buf=0x7f3087532800
"F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b")
at posix-aio-compat.c:212
#2  0x0000000000447d48 in handle_aiocb_rw (unused=<value optimized
out>) at posix-aio-compat.c:247
#3  aio_thread (unused=<value optimized out>) at posix-aio-compat.c:341
#4  0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3084e0653d in clone () from /lib64/libc.so.6

I think io_thread is blocked by cpu thread which take the qemu_mutux
first, cpu thread is waiting for aio_thread's result by qemu_aio_wait
function,  aio_thead take much time on pwrite64, it will take about
5-10s, then return a error(it seems like an non-block timeout call),
after that, io thead will have a chance to receive monitor input, so
the monitor seems to blocked frequently. in this suition, if I stop
the vm, the monitor will response faster.

the problem is caused by unavailabity of block layer, the block layer
process the io error in a normal way, it report error to ide device,
the error is handled in ide_sector_write. the root cause is: monitor's
input and io operation(pwrite function) must execute in a serialized
method(by qemu_mutux seamphore), so pwrite long block time will hinder
monitor input.

as stefan says, it seems difficult to take monitor input out of the
protection, currently I will stop the vm if the disk image can not be
reached.


2011/3/1 Avi Kivity <avi@xxxxxxxxxx>:
> On 03/01/2011 05:01 PM, Stefan Hajnoczi wrote:
>>
>> On Tue, Mar 1, 2011 at 12:39 PM, ya su<suya94335@xxxxxxxxx>  wrote:
>> >      how about to remove kvm_handle_io/handle_mmio in kvm_run function
>> >  into kvm_main_loop, as these operation belong to io operation, this
>> >  will remove the qemu_mutux between the 2 threads. is this an
>> >  reasonable thought?
>> >
>> >      In order to keep the monitor to response to user quicker under
>> >  this suition, an easier way is to take monito io out of qemu_mutux
>> >  protection. this include vnc/serial/telnet io related with monitor,
>> >  as these io will not affect the running of vm itself, it need not in
>> >  so stirct protection.
>>
>> The qemu_mutex protects all QEMU global state.  The monitor does some
>> I/O and parsing which is not necessarily global state but once it
>> begins actually performing the command you sent, access to global
>> state will be required (pretty much any monitor command will operate
>> on global state).
>>
>> I think there are two options for handling NFS hangs:
>> 1. Ensure that QEMU is never put to sleep by NFS for disk images.  The
>> guest continues executing, may time out and notice that storage is
>> unavailable.
>
> That's the NFS soft mount option.
>
>> 2. Pause the VM but keep the monitor running if a timeout error
>> occurs.  Not sure if there is a timeout from NFS that we can detect.
>
> The default setting (hard mount) will retry forever in the kernel.
>  Moreover, the other default setting (nointr) means we can't even signal the
> hung thread.
>
>> For I/O errors (e.g. running out of disk space on the host) there is a
>> configurable policy.  You can choose whether to return an error to the
>> guest or to pause the VM.  I think we should treat NFS hangs as an
>> extension to this and as a block layer problem rather than an io
>> thread problem.
>
> I agree.  Mount the share as a soft,intr mount and let the kernel time out
> and return an I/O error.
>
>> Can you get backtraces when KVM hangs (gdb command: thread apply all
>> bt)?  It would be interesting to see some of the blocking cases that
>> you are hitting.
>
> Won't work (at least under the default configuration) since those threads
> are uninterruptible.  At the very least you need an interruptible mount.
>
> --
> error compiling committee.c: too many arguments to function
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux