[RFC PATCH] fix select(2) race between main_loop_wait and qemu_aio_wait

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Mon, 5 Mar 2012 09:34:15 +0100

This is quite ugly.  Two threads, one running main_loop_wait and
one running qemu_aio_wait, can race with each other on running the
same iohandler.  The result is that an iohandler could run while the
underlying socket is not readable or writable, with possibly ill effects.

This shows as a failure to boot an IDE disk using the NBD device.
We can consider it a bug in NBD or in the main loop.  The patch fixes
this in main_loop_wait, which is always going to lose the race because
qemu_aio_wait runs select with the global lock held.

Reported-by: Laurent Vivier <laurent@xxxxxxxxx>
Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---
	Anthony, if you think this is too ugly tell me and I can
	post an NBD fix too.

 main-loop.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/main-loop.c b/main-loop.c
index db23de0..3beccff 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -458,6 +458,13 @@ int main_loop_wait(int nonblocking)
 
     if (timeout > 0) {
         qemu_mutex_lock_iothread();
+
+        /* Poll again.  A qemu_aio_wait() on another thread
+         * could have made the fdsets stale.
+         */
+        tv.tv_sec = 0;
+        tv.tv_usec = 0;
+        ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv);
     }
 
     glib_select_poll(&rfds, &wfds, &xfds, (ret < 0));
-- 
1.7.7.6

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html