Re: [Qemu-devel] Re: segmentation fault in qemu-kvm-0.14.0

Jan Kiszka <jan.kiszka@xxxxxx> · Wed, 09 Mar 2011 10:04:02 +0100

On 2011-03-09 09:50, Corentin Chary wrote:
> On Wed, Mar 9, 2011 at 7:37 AM, Jan Kiszka <jan.kiszka@xxxxxx> wrote:
>> On 2011-03-08 23:53, Peter Lieven wrote:
>>> Hi,
>>>
>>> during testing of qemu-kvm-0.14.0 i can reproduce the following segfault. i have seen similar crash already in 0.13.0, but had no time to debug.
>>> my guess is that this segfault is related to the threaded vnc server which was introduced in qemu 0.13.0. the bug is only triggerable if a vnc
>>> client is attached. it might also be connected to a resolution change in the guest. i have a backtrace attached. the debugger is still running if someone
>>> needs more output
>>>
>>
>> ...
>>
>>> Thread 1 (Thread 0x7ffff7ff0700 (LWP 29038)):
>>> #0  0x0000000000000000 in ?? ()
>>> No symbol table info available.
>>> #1  0x000000000041d669 in main_loop_wait (nonblocking=0)
>>>     at /usr/src/qemu-kvm-0.14.0/vl.c:1388
>>
>> So we are calling a IOHandlerRecord::fd_write handler that is NULL.
>> Looking at qemu_set_fd_handler2, this may happen if that function is
>> called for an existing io-handler entry with non-NULL write handler,
>> passing a NULL write and a non-NULL read handler. And all this without
>> the global mutex held.
>>
>> And there are actually calls in vnc_client_write_plain and
>> vnc_client_write_locked (in contrast to vnc_write) that may generate
>> this pattern. It's probably worth validating that the iothread lock is
>> always held when qemu_set_fd_handler2 is invoked to confirm this race
>> theory, adding something like
>>
>> assert(pthread_mutex_trylock(&qemu_mutex) != 0);
>> (that's for qemu-kvm only)
>>
>> BTW, qemu with just --enable-vnc-thread, ie. without io-thread support,
>> should always run into this race as it then definitely lacks a global mutex.
> 
> I'm not sure what mutex should be locked here (qemu_global_mutex,
> qemu_fair_mutex, lock_iothread).

In upstream qemu, the latter - if it exists (which is not the case in
non-io-thread mode).

In qemu-kvm, those locks are yet unused. Rather the code in qemu-kvm.c
implements the global lock.

> But here is where is should be locked (other vnc_write calls in this
> thread should never trigger qemu_set_fd_handler):
> 
> diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c
> index 1d4c5e7..e02d891 100644
> --- a/ui/vnc-jobs-async.c
> +++ b/ui/vnc-jobs-async.c
> @@ -258,7 +258,9 @@ static int vnc_worker_thread_loop(VncJobQueue *queue)
>          goto disconnected;
>      }
> 
> +    /* lock */
>      vnc_write(job->vs, vs.output.buffer, vs.output.offset);
> +    /* unlock */
> 
>  disconnected:
>      /* Copy persistent encoding data */
> @@ -267,7 +269,9 @@ disconnected:
>      vnc_unlock_output(job->vs);
> 
>      if (flush) {
> +        /* lock */
>          vnc_flush(job->vs);
> +       /* unlock */
>      }

Particularly this call is critical as it will trigger
vnc_client_write_locked which may NULL'ify the write handler.

But I'm also not sure about vnc_send_framebuffer_update. Someone has to
go through the threaded vnc code again, very thoroughly. It looks fragile.

Jan

Attachment:
signature.asc

Description: OpenPGP digital signature