On Mon, Mar 21, 2011 at 06:15:36PM +0800, Hu Tao wrote: > On Fri, Mar 18, 2011 at 11:20:56AM +0000, Daniel P. Berrange wrote: > > On Wed, Mar 16, 2011 at 06:30:20PM +0800, Hu Tao wrote: > > > --- > > > src/qemu/qemu_domain.c | 26 +----------- > > > src/qemu/qemu_monitor.c | 106 ++++++++++++++++++++++++++-------------------- > > > src/qemu/qemu_monitor.h | 4 +- > > > 3 files changed, 64 insertions(+), 72 deletions(-) > > > > > > diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c > > > index 8a2b9cc..e056ef0 100644 > > > --- a/src/qemu/qemu_domain.c > > > +++ b/src/qemu/qemu_domain.c > > > @@ -566,7 +566,6 @@ void qemuDomainObjEnterMonitor(virDomainObjPtr obj) > > > qemuDomainObjPrivatePtr priv = obj->privateData; > > > > > > qemuMonitorLock(priv->mon); > > > - qemuMonitorRef(priv->mon); > > > virDomainObjUnlock(obj); > > > } > > > > > > @@ -578,19 +577,9 @@ void qemuDomainObjEnterMonitor(virDomainObjPtr obj) > > > void qemuDomainObjExitMonitor(virDomainObjPtr obj) > > > { > > > qemuDomainObjPrivatePtr priv = obj->privateData; > > > - int refs; > > > - > > > - refs = qemuMonitorUnref(priv->mon); > > > - > > > - if (refs > 0) > > > - qemuMonitorUnlock(priv->mon); > > > > > > + qemuMonitorUnlock(priv->mon); > > > virDomainObjLock(obj); > > > - > > > - if (refs == 0) { > > > - virDomainObjUnref(obj); > > > - priv->mon = NULL; > > > - } > > > } > > > > > > > > > @@ -608,7 +597,6 @@ void qemuDomainObjEnterMonitorWithDriver(struct qemud_driver *driver, > > > qemuDomainObjPrivatePtr priv = obj->privateData; > > > > > > qemuMonitorLock(priv->mon); > > > - qemuMonitorRef(priv->mon); > > > virDomainObjUnlock(obj); > > > qemuDriverUnlock(driver); > > > } > > > @@ -623,20 +611,10 @@ void qemuDomainObjExitMonitorWithDriver(struct qemud_driver *driver, > > > virDomainObjPtr obj) > > > { > > > qemuDomainObjPrivatePtr priv = obj->privateData; > > > - int refs; > > > - > > > - refs = qemuMonitorUnref(priv->mon); > > > - > > > - if (refs > 0) > > > - qemuMonitorUnlock(priv->mon); > > > > > > + qemuMonitorUnlock(priv->mon); > > > qemuDriverLock(driver); > > > virDomainObjLock(obj); > > > - > > > - if (refs == 0) { > > > - virDomainObjUnref(obj); > > > - priv->mon = NULL; > > > - } > > > } > > > > I don't think these changes aren't right. The EnterMonitor() method > > is releasing the lock on the virDomainObjPtr. This means that other > > threads (specifically the main I/O thread) can use the virDomainObjPtr > > and its qemuMonitorPtr instance. Incrementing reference count on the > > qemuMonitorPtr is the only thing that prevents it being free()d while > > it is unlocked by this thread. > > > > eg, consider this > > > > In the main thread making a monitor call > > > > qemuDomainObjEnterMonitor(obj) > > > > qemuMonitorSetCPU(mon, 1, 1); > > > > qemuDomainObjExitMonitor(obj) > > > > > > So, at step 2 there, qemuMonitorSetCPU() is asleep on the condition > > variable, waiting for the reply from QEMU. > > > > > > Now QEMU dies for some reason, so in the I/O thread, we have > > qemuMonitorIO() called. This sees the EOF on the socket, so > > it invokes the callback "(eofNotify)(mon, vm, failed);". > > > > This callback is qemuMonitorClose(), which calls qemuMonitorFree() > > which releases the last reference count, which deletes the condition > > variable. The other thread is now waiting on a condition variable > > which has been deleted. It'll either hang forever, or crash > > > > We *must* prevent the monitor being free'd while a thread is running > > a command, hence qemuDomainObjEnterMonitor() must call qemuMonitorRef() > > and qemuDomainObjExitMonitor() must call qemuMonitorUnref(). > > You're all right except on when to ref the monitor. The monitor may be > freed by another thread before call of qemuDomainObjEnterMonitor(), so > call of qemuMonitorRef() at this time doesn't help. I think the monitor > must have already been refed before a thread starts operating it. No, that scenario should not be possible. As per src/qemu/THREADS.txt, the 'virDomainObjPtr' lock *must* be held when making any change to that object, so if 'vm' is locked, and you have checked virDomainObjIsActive, then it should not be poissible for ',mon' to be NULL in the call to qemuDomainObjEnterMonitor. Consider this sequence: virDomainObjPtr vm; qemuDriverLock(driver); vm = virDomainFindByUUID(&driver->domains, dom->uuid); qemuDriverUnlock(driver); /* The mutex on 'vm' is held at this point */ if (qemuDomainObjBeginJob(vm) < 0) goto cleanup; /* We know that no other thread has a monitor command active at this point. */ if (virDomainObjIsActive(vm)) { qemuReportError(VIR_ERR_OPERATION_INVALID, "%s", _("domain is not running")); goto cleanup; } /* We know that priv->mon is non-NULL at this point */ qemuDomainObjPrivatePtr priv = vm->privateData; qemuDomainObjEnterMonitor(vm); /* 'vm' is now unlocked, but priv->mon cannot be free()d by the I/O thread, because EnterMonitor() acquired an extra reference on it */ ret = qemuMonitorGetMemoryStats(priv->mon, stats, nr_stats); qemuDomainObjExitMonitor(vm); /* 'vm' is now locked again, but priv->mon may now be NULL, since ExitMonitor released a reference */ if (qemuDomainObjEndJob(vm) == 0) vm = NULL; /* vm may be NULL, becuase EndJob released a reference on it, that was acquired by BeginJob */ Provided the call to virDomainObjIsActive() is not forgotten, we can be sure that 'mon' exists in qemuDomainObjEnterMonitor. The qemuDomainObjEnterMonitor() call, acquires an extra reference to guarentee that the monitor is never free'd while the qemuMonitorGetMemoryStats() API is running, even if the I/O thread releases the primary reference on the monitor. The qemuDomainObjExitMonitor() call releases the reference that was acquired by EnterMonitor, so, it is possible that the monitor is now freed. If a second monitor call is to be made now, the virDomainObjIsActive API *must* be checked again, before calling EnterMonitor again. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list