Daniel, thanks for the help. I was able to fix the problem (see my post in a new thread). On Fri, 2010-03-05 at 09:32 +0000, Daniel P. Berrange wrote: > On Thu, Mar 04, 2010 at 02:22:35PM -0600, Adam Litke wrote: > > I have a multi-threaded Python program that shares a single libvirt > > connection object among several threads (one thread per active domain on > > the system plus a management thread). On a heavily loaded host with 8 > > running domains I am getting a consistent libvirtd segfault in the qemu > > monitor handling code. This happens with libvirt-0.7.6 and git. > > > > Mar 4 12:23:13 bc1cn7-mgmt kernel: [ 3947.836151] libvirtd[7716]: > > segfault at 24 ip 000000000045de5c sp 00007fe5aa7d2b20 error 4 in > > libvirtd[400000+b3000] > > > > Using addr2line, this translates to: libvirt/src/qemu/qemu_monitor.c:698 > > > > Which is in qemuMonitorSend(): > > > > --> while (!mon->msg->finished) { > > if (virCondWait(&mon->notify, &mon->lock) < 0) > > goto cleanup; > > } > > > > It seems that mon->msg is being reset to NULL in the middle of this loop > > execution. I suspect that is because qemuMonitorSend() is not reentrant > > and multiple threads in my program are racing here. I would guess the > > 'mon->msg = NULL;' on line 707 causes the NULL that trips up the other > > racer. > > > I presume the Monitor interface has some locking protection around it to > > ensure that only one thread can use it at a time? > > You are correct that qemuMonitorSend() is not re-entrant. qemuMonitorSend() > is invoked by any of the qemuMonitorXXXX() APIs. For all these APIs, the > QEMU driver code is required to first hold the lock by calling > qemuDomainObjEnterMonitor() and release it when dine with the method > qemuDomainObjExitMonitor. > > eg, > > qemuDomainObjEnterMonitor(obj); > naddrs = qemuMonitorGetAllPCIAddresses(priv->mon, > &addrs); > qemuDomainObjExitMonitor(obj); > > > Is there an easy way to fix this? I am not familiar with the measures > > employed to make libvirt thread-safe. Thanks! > > The first step is to try to identify which functions were run concurrently > > Try running libvirtd with > > LIBVIRT_LOG_FILTERS=1:qemu LIBVIRT_LOG_OUTPUTS=1:stderr > > > You'll get quite alot of data printed out for all montor calls which might > let you see which overlap. You might want to add further log messages in the > qemuMonitorSend() method itself to help with this. > > There is a small chance that using GDB 'thread apply all backtrace' when > it crashes will show you info, but that's fairly unlikely > > The other possibility is buffer corruption in the qemuMonitor struct, but > that seems less likely > > Regards, > Daniel -- Thanks, Adam -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list