Re: [libvirt] Libvirt segfault in qemuMonitorSend() with multi-threaded API use

Adam Litke <agl@xxxxxxxxxx> · Fri, 05 Mar 2010 14:14:07 -0600

Daniel, thanks for the help.  I was able to fix the problem (see my post
in a new thread).

On Fri, 2010-03-05 at 09:32 +0000, Daniel P. Berrange wrote:
> On Thu, Mar 04, 2010 at 02:22:35PM -0600, Adam Litke wrote:
> > I have a multi-threaded Python program that shares a single libvirt
> > connection object among several threads (one thread per active domain on
> > the system plus a management thread).  On a heavily loaded host with 8
> > running domains I am getting a consistent libvirtd segfault in the qemu
> > monitor handling code.  This happens with libvirt-0.7.6 and git.
> > 
> > Mar  4 12:23:13 bc1cn7-mgmt kernel: [ 3947.836151] libvirtd[7716]:
> > segfault at 24 ip 000000000045de5c sp 00007fe5aa7d2b20 error 4 in
> > libvirtd[400000+b3000]
> > 
> > Using addr2line, this translates to: libvirt/src/qemu/qemu_monitor.c:698
> > 
> > Which is in qemuMonitorSend():
> > 
> > --> while (!mon->msg->finished) { 
> >         if (virCondWait(&mon->notify, &mon->lock) < 0)
> >             goto cleanup;
> >     }
> > 
> > It seems that mon->msg is being reset to NULL in the middle of this loop
> > execution.  I suspect that is because qemuMonitorSend() is not reentrant
> > and multiple threads in my program are racing here.  I would guess the
> > 'mon->msg = NULL;' on line 707 causes the NULL that trips up the other
> > racer.
> 
> > I presume the Monitor interface has some locking protection around it to
> > ensure that only one thread can use it at a time?
> 
> You are correct that qemuMonitorSend() is not re-entrant. qemuMonitorSend()
> is invoked by any of the qemuMonitorXXXX() APIs. For all these APIs, the
> QEMU driver code is required to first hold the lock by calling
> qemuDomainObjEnterMonitor() and release it when dine with the method
> qemuDomainObjExitMonitor.
> 
> eg, 
> 
>   qemuDomainObjEnterMonitor(obj);
>   naddrs = qemuMonitorGetAllPCIAddresses(priv->mon,
>                                          &addrs);
>   qemuDomainObjExitMonitor(obj);
> 
> > Is there an easy way to fix this?  I am not familiar with the measures
> > employed to make libvirt thread-safe.  Thanks!
> 
> The first step is to try to identify which functions were run concurrently
> 
> Try running libvirtd with 
> 
>   LIBVIRT_LOG_FILTERS=1:qemu LIBVIRT_LOG_OUTPUTS=1:stderr
> 
> 
> You'll get quite alot of data printed out for all montor calls which might
> let you see which overlap. You might want to add further log messages in the
> qemuMonitorSend() method itself to help with this.
> 
> There is a small chance that using GDB 'thread apply all backtrace' when
> it crashes will show you info, but that's fairly unlikely
> 
> The other possibility is buffer corruption in the qemuMonitor struct, but
> that seems less likely
> 
> Regards,
> Daniel

-- 
Thanks,
Adam

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list