2009/11/26 Daniel P. Berrange <berrange@xxxxxxxxxx>: > If QEMU shuts down while we're in the middle of processing a > monitor command, the monitor will be freed, and upon cleaning > up we attempt to do qemuMonitorUnlock(priv->mon) when priv->mon > is NULL. > > To address this we introduce proper reference counting into > the qemuMonitorPtr object, and hold an extra reference whenever > executing a command. > > * src/qemu/qemu_driver.c: Hold a reference on the monitor while > executing commands, and only NULL-ify the priv->mon field when > the last reference is released > * src/qemu/qemu_monitor.h, src/qemu/qemu_monitor.c: Add reference > counting to handle safe deletion of monitor objects The locking pattern below results in destroying a locked mutex. It this intended? qemuMonitorLock(mon); [...] if (qemuMonitorUnref(mon) > 0) qemuMonitorUnlock(mon); Well, this patch makes the TCK deadlock for me, seems to be a lock ordering issue combined with a race condition; it doesn't happen every run. I don't understand all details of the locking and refcounting scheme of the QEMU monitor yet, it's quite complex and gets even more complex. I attached some GDB and Valgrind traces. Debugging is hindered by libvirtd blocking on poll() in virEventRunOnce() often and I haven't found out why yet. Matthias
==8990== Thread #2: lock order "0x9AB9030 before 0x9ABAA80" violated ==8990== at 0x4C27ADC: pthread_mutex_lock (hg_intercepts.c:464) ==8990== by 0x432079: virMutexLock (threads-pthread.c:52) ==8990== by 0x529CABE: virDomainObjLock (domain_conf.c:5344) ==8990== by 0x45CCB8: qemuMonitorIO (qemu_monitor.c:440) ==8990== by 0x4132AE: virEventDispatchHandles (event.c:473) ==8990== by 0x4138B6: virEventRunOnce (event.c:601) ==8990== by 0x4188F6: qemudOneLoop (libvirtd.c:2165) ==8990== by 0x418E25: qemudRunLoop (libvirtd.c:2274) ==8990== by 0x4C2B528: mythread_wrapper (hg_intercepts.c:201) ==8990== by 0x81C63B9: start_thread (in /lib/libpthread-2.9.so) ==8990== by 0x84C0FCC: clone (in /lib/libc-2.9.so) ==8990== Required order was established by acquisition of lock at 0x9AB9030 ==8990== at 0x4C27ADC: pthread_mutex_lock (hg_intercepts.c:464) ==8990== by 0x432079: virMutexLock (threads-pthread.c:52) ==8990== by 0x529CABE: virDomainObjLock (domain_conf.c:5344) ==8990== by 0x43AA71: qemuReconnectDomain (qemu_driver.c:684) ==8990== by 0x52791F8: virHashForEach (hash.c:495) ==8990== by 0x43ABD0: qemuReconnectDomains (qemu_driver.c:728) ==8990== by 0x43B8BC: qemudStartup (qemu_driver.c:987) ==8990== by 0x52B280F: virStateInitialize (libvirt.c:829) ==8990== by 0x41BA2B: main (libvirtd.c:3154) ==8990== followed by a later acquisition of lock at 0x9ABAA80 ==8990== at 0x4C27ADC: pthread_mutex_lock (hg_intercepts.c:464) ==8990== by 0x432079: virMutexLock (threads-pthread.c:52) ==8990== by 0x45BE28: qemuMonitorLock (qemu_monitor.c:82) ==8990== by 0x45CE64: qemuMonitorOpen (qemu_monitor.c:475) ==8990== by 0x43A9B0: qemuConnectMonitor (qemu_driver.c:663) ==8990== by 0x43AAC2: qemuReconnectDomain (qemu_driver.c:689) ==8990== by 0x52791F8: virHashForEach (hash.c:495) ==8990== by 0x43ABD0: qemuReconnectDomains (qemu_driver.c:728) ==8990== by 0x43B8BC: qemudStartup (qemu_driver.c:987) ==8990== by 0x52B280F: virStateInitialize (libvirt.c:829) ==8990== by 0x41BA2B: main (libvirtd.c:3154)
(gdb) thread apply all bt Thread 7 (Thread 0x7f2b0827d950 (LWP 8179)): #0 0x00007f2b0dd1b2e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0000000000432135 in virCondWait (c=0x1ba5fc8, m=0x1ba5fa0) at util/threads-pthread.c:84 #2 0x00000000004171ea in qemudWorker (data=0x1baa320) at libvirtd.c:1496 #3 0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0 #4 0x00007f2b0da83fcd in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 6 (Thread 0x7f2b08a7e950 (LWP 8176)): #0 0x00007f2b0dd1b2e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0000000000432135 in virCondWait (c=0x1ba5fc8, m=0x1ba5fa0) at util/threads-pthread.c:84 #2 0x00000000004171ea in qemudWorker (data=0x1baa308) at libvirtd.c:1496 #3 0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0 #4 0x00007f2b0da83fcd in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 5 (Thread 0x7f2b0927f950 (LWP 8175)): #0 0x00007f2b0dd1b2e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0000000000432135 in virCondWait (c=0x1ba5fc8, m=0x1ba5fa0) at util/threads-pthread.c:84 #2 0x00000000004171ea in qemudWorker (data=0x1baa2f0) at libvirtd.c:1496 #3 0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0 #4 0x00007f2b0da83fcd in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 4 (Thread 0x7f2b09a80950 (LWP 8174)): #0 0x00007f2b0dd1da94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007f2b0dd19190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007f2b0dd18a7e in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x000000000043207a in virMutexLock (m=0x1bc3960) at util/threads-pthread.c:52 #4 0x000000000045be29 in qemuMonitorLock (mon=0x1bc3960) at qemu/qemu_monitor.c:82 #5 0x000000000045d168 in qemuMonitorClose (mon=0x1bc3960) at qemu/qemu_monitor.c:541 #6 0x000000000043f59d in qemudShutdownVMDaemon (conn=0x1bb18f0, driver=0x1baa9e0, vm=0x1bc1500) at qemu/qemu_driver.c:2410 #7 0x000000000044103c in qemudDomainDestroy (dom=0x1bb5f00) at qemu/qemu_driver.c:3097 #8 0x00007f2b10b7fbce in virDomainDestroy (domain=0x1bb5f00) at libvirt.c:1978 #9 0x000000000041d708 in remoteDispatchDomainDestroy (server=0x1ba5fa0, client=0x1bb16e0, conn=0x1bb18f0, hdr=0x1c03fb0, rerr=0x7f2b09a7fe30, args=0x7f2b09a7fed0, ret=0x7f2b09a7ff20) at remote.c:925 #10 0x000000000042619f in remoteDispatchClientCall (server=0x1ba5fa0, client=0x1bb16e0, msg=0x1bc3fa0) at dispatch.c:506 #11 0x0000000000425d74 in remoteDispatchClientRequest (server=0x1ba5fa0, client=0x1bb16e0, msg=0x1bc3fa0) at dispatch.c:388 ---Type <return> to continue, or q <return> to quit--- #12 0x00000000004172af in qemudWorker (data=0x1baa2d8) at libvirtd.c:1518 #13 0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0 #14 0x00007f2b0da83fcd in clone () from /lib/libc.so.6 #15 0x0000000000000000 in ?? () Thread 3 (Thread 0x7f2b0a281950 (LWP 8173)): #0 0x00007f2b0dd1b2e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0000000000432135 in virCondWait (c=0x1ba5fc8, m=0x1ba5fa0) at util/threads-pthread.c:84 #2 0x00000000004171ea in qemudWorker (data=0x1baa2c0) at libvirtd.c:1496 #3 0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0 #4 0x00007f2b0da83fcd in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 2 (Thread 0x7f2b0aa82950 (LWP 8172)): #0 0x00007f2b0dd1da94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007f2b0dd19190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007f2b0dd18a7e in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x000000000043207a in virMutexLock (m=0x1bc1500) at util/threads-pthread.c:52 #4 0x00007f2b10b67abf in virDomainObjLock (obj=0x1bc1500) at conf/domain_conf.c:5344 #5 0x000000000045ccb9 in qemuMonitorIO (watch=9, fd=18, events=0, opaque=0x1bc3960) at qemu/qemu_monitor.c:440 #6 0x00000000004132af in virEventDispatchHandles (nfds=7, fds=0x1bb1bf0) at event.c:473 #7 0x00000000004138b7 in virEventRunOnce () at event.c:601 #8 0x00000000004188f7 in qemudOneLoop () at libvirtd.c:2165 #9 0x0000000000418e26 in qemudRunLoop (opaque=0x1ba5fa0) at libvirtd.c:2274 #10 0x00007f2b0dd173ba in start_thread () from /lib/libpthread.so.0 #11 0x00007f2b0da83fcd in clone () from /lib/libc.so.6 #12 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f2b114a2780 (LWP 8169)): #0 0x00007f2b0dd17c95 in pthread_join () from /lib/libpthread.so.0 #1 0x000000000041bb39 in main (argc=1, argv=0x7fff194db738) at libvirtd.c:3183 (gdb)
(gdb) thread apply all bt Thread 7 (Thread 0x7f1081ba7950 (LWP 26832)): #0 0x00007f10876452e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0000000000432135 in virCondWait (c=0x162cfc8, m=0x162cfa0) at util/threads-pthread.c:84 #2 0x00000000004171ea in qemudWorker (data=0x1631320) at libvirtd.c:1496 #3 0x00007f10876413ba in start_thread () from /lib/libpthread.so.0 #4 0x00007f10873adfcd in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 6 (Thread 0x7f10823a8950 (LWP 26831)): #0 0x00007f10876452e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0000000000432135 in virCondWait (c=0x162cfc8, m=0x162cfa0) at util/threads-pthread.c:84 #2 0x00000000004171ea in qemudWorker (data=0x1631308) at libvirtd.c:1496 #3 0x00007f10876413ba in start_thread () from /lib/libpthread.so.0 #4 0x00007f10873adfcd in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 5 (Thread 0x7f1082ba9950 (LWP 26830)): #0 0x00007f10876452e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0000000000432135 in virCondWait (c=0x162cfc8, m=0x162cfa0) at util/threads-pthread.c:84 #2 0x00000000004171ea in qemudWorker (data=0x16312f0) at libvirtd.c:1496 #3 0x00007f10876413ba in start_thread () from /lib/libpthread.so.0 #4 0x00007f10873adfcd in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 4 (Thread 0x7f10833aa950 (LWP 26829)): #0 0x00007f1087647a94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007f1087643190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007f1087642a7e in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x000000000043207a in virMutexLock (m=0x164a0f0) at util/threads-pthread.c:52 #4 0x000000000045be29 in qemuMonitorLock (mon=0x164a0f0) at qemu/qemu_monitor.c:82 #5 0x0000000000439a24 in qemuDomainObjEnterMonitorWithDriver (driver=0x1631720, obj=0x1648da0) at qemu/qemu_driver.c:309 #6 0x000000000043c8d0 in qemudInitCpus (conn=0x1638530, driver=0x1631720, vm=0x1648da0, migrateFrom=0x0) at qemu/qemu_driver.c:1427 #7 0x000000000043f14e in qemudStartVMDaemon (conn=0x1638530, driver=0x1631720, vm=0x1648da0, migrateFrom=0x0, stdin_fd=-1) at qemu/qemu_driver.c:2327 #8 0x00000000004449ad in qemudDomainStart (dom=0x1648a80) at qemu/qemu_driver.c:4384 #9 0x00007f108a4ae2b3 in virDomainCreate (domain=0x1648a80) at libvirt.c:4509 #10 0x000000000041d567 in remoteDispatchDomainCreate (server=0x162cfa0, client=0x1638690, conn=0x1638530, hdr=0x168b150, rerr=0x7f10833a9e30, args=0x7f10833a9ed0, ret=0x7f10833a9f20) at remote.c:853 ---Type <return> to continue, or q <return> to quit--- #11 0x000000000042619f in remoteDispatchClientCall (server=0x162cfa0, client=0x1638690, msg=0x164b140) at dispatch.c:506 #12 0x0000000000425d74 in remoteDispatchClientRequest (server=0x162cfa0, client=0x1638690, msg=0x164b140) at dispatch.c:388 #13 0x00000000004172af in qemudWorker (data=0x16312d8) at libvirtd.c:1518 #14 0x00007f10876413ba in start_thread () from /lib/libpthread.so.0 #15 0x00007f10873adfcd in clone () from /lib/libc.so.6 #16 0x0000000000000000 in ?? () Thread 3 (Thread 0x7f1083bab950 (LWP 26828)): #0 0x00007f10876452e9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x0000000000432135 in virCondWait (c=0x162cfc8, m=0x162cfa0) at util/threads-pthread.c:84 #2 0x00000000004171ea in qemudWorker (data=0x16312c0) at libvirtd.c:1496 #3 0x00007f10876413ba in start_thread () from /lib/libpthread.so.0 #4 0x00007f10873adfcd in clone () from /lib/libc.so.6 #5 0x0000000000000000 in ?? () Thread 2 (Thread 0x7f10843ac950 (LWP 26826)): #0 0x00007f1087647a94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007f1087643190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007f1087642a7e in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x000000000043207a in virMutexLock (m=0x1648da0) at util/threads-pthread.c:52 #4 0x00007f108a491abf in virDomainObjLock (obj=0x1648da0) at conf/domain_conf.c:5344 #5 0x000000000045ccb9 in qemuMonitorIO (watch=8, fd=16, events=0, opaque=0x164a0f0) at qemu/qemu_monitor.c:440 #6 0x00000000004132af in virEventDispatchHandles (nfds=6, fds=0x7f107c0008f0) at event.c:473 #7 0x00000000004138b7 in virEventRunOnce () at event.c:601 #8 0x00000000004188f7 in qemudOneLoop () at libvirtd.c:2165 #9 0x0000000000418e26 in qemudRunLoop (opaque=0x162cfa0) at libvirtd.c:2274 #10 0x00007f10876413ba in start_thread () from /lib/libpthread.so.0 #11 0x00007f10873adfcd in clone () from /lib/libc.so.6 #12 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f108adcc780 (LWP 26823)): #0 0x00007f1087641c95 in pthread_join () from /lib/libpthread.so.0 #1 0x000000000041bb39 in main (argc=1, argv=0x7fff92e04068) at libvirtd.c:3183
-- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list