On Wed, Aug 08, 2012 at 02:26:26PM +0800, Alex Jia wrote: > On 08/07/2012 07:34 PM, Daniel P. Berrange wrote: > >On Tue, Aug 07, 2012 at 03:18:38PM +0800, Alex Jia wrote: > >>* src/qemu/qemu_domain.c (qemuDomainObjExitAgentInternal): fix crashing > >> libvirtd due to derefing a NULL pointer. > >> > >>For details, please see bug: > >>RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=845966 > >> > >>Signed-off-by: Alex Jia<ajia@xxxxxxxxxx> > >>--- > >> src/qemu/qemu_domain.c | 10 ++++++---- > >> 1 files changed, 6 insertions(+), 4 deletions(-) > >> > >>diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c > >>index 86f0265..8667b6c 100644 > >>--- a/src/qemu/qemu_domain.c > >>+++ b/src/qemu/qemu_domain.c > >>@@ -1136,12 +1136,14 @@ qemuDomainObjExitAgentInternal(struct qemud_driver *driver, > >> virDomainObjPtr obj) > >> { > >> qemuDomainObjPrivatePtr priv = obj->privateData; > >>- int refs; > >>+ int refs = -1; > >> > >>- refs = qemuAgentUnref(priv->agent); > >>+ if (priv->agent) { > >>+ refs = qemuAgentUnref(priv->agent); > >> > >>- if (refs> 0) > >>- qemuAgentUnlock(priv->agent); > >>+ if (refs> 0) > >>+ qemuAgentUnlock(priv->agent); > >>+ } > >> > >> if (driver_locked) > >> qemuDriverLock(driver); > >I'm not convinced this is the right fix. The whole point of the Enter/ExitAgent > >methods is to hold an extra reference on priv->agent, so that it is *not* > >deleted while a agent command is run. > > > >What is setting priv->agent to NULL while the command is still active ? > > In fact, the command 'guest-suspend-disk' is freed by > virJSONValueFree() in qemuAgentSuspend() after the command is > successfully sent via 'qemuAgentCommand()': > > (gdb) s > qemuDomainPMSuspendForDuration (dom=<value optimized out>, target=1, > duration=<value optimized out>, flags=<value optimized out>) at > qemu/qemu_driver.c:13123 > 13123 qemuDomainObjExitAgent(driver, vm); > (gdb) p *vm > $68 = {object = {magic = 3405643788, refs = 4, klass = > 0x7f4ce815a9b0}, lock = {lock = {__data = {__lock = 1, __count = 0, > __owner = 20285, __nusers = 1, __kind = 0, __spins = 0, __list = > {__prev = 0x0, > __next = 0x0}}, __size = > "\001\000\000\000\000\000\000\000=O\000\000\001", '\000' <repeats 26 > times>, __align = 1}}, pid = 20379, *state = {state = 4, reason = > 0}*, autostart = 0, persistent = 1, > updated = 0, def = 0x7f4ce815e500, newDef = 0x7f4ce8069b80, > snapshots = {objs = 0x7f4ce815e240, metaroot = {def = 0x0, parent = > 0x0, sibling = 0x0, nchildren = 0, first_child = 0x0}}, > current_snapshot = 0x0, hasManagedSave = false, privateData = > 0x7f4ce8154660, privateDataFreeFunc = 0x7f4cefada190 > <qemuDomainObjPrivateFree>, taint = 4} > (gdb) s > 13122 ret = qemuAgentSuspend(priv->agent, target); > > (gdb) p *priv > $70 = {job = {cond = {cond = {__data = {__lock = 0, __futex = 0, > __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, > __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 > times>, > __align = 0}}, active = QEMU_JOB_MODIFY, owner = 20286, > asyncCond = {cond = {__data = {__lock = 0, __futex = 0, __total_seq > = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = > 0, > __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, > __align = 0}}, asyncJob = QEMU_ASYNC_JOB_NONE, asyncOwner = 0, phase > = 0, mask = 0, start = 0, dump_memory_only = false, info = { > type = 0, timeElapsed = 0, timeRemaining = 0, dataTotal = 0, > dataProcessed = 0, dataRemaining = 0, memTotal = 0, memProcessed = > 0, memRemaining = 0, fileTotal = 0, fileProcessed = 0, > fileRemaining = 0}}, mon = 0x7f4ce80a3570, monConfig = 0x0, > monJSON = 1, monError = false, monStart = 0, *agent = 0x0*, > agentError = false, agentStart = 1344402957193, gotShutdown = true, > beingDestroyed = false, pidfile = 0x7f4ce816ba90 > "/var/run/libvirt/qemu/myRHEL6.pid", nvcpupids = 1, vcpupids = > 0x7f4ce8146be0, pciaddrs = 0x7f4ce8171b30, persistentAddrs = 1, > qemuCaps = 0x7f4ce80b4030, > lockState = 0x0, fakeReboot = false, jobs_queued = 1, > migMaxBandwidth = 32, origname = 0x0, cons = 0x7f4ce8165ce0, > cleanupCallbacks = 0x0, ncleanupCallbacks = 0, ncleanupCallbacks_max > = 0} > (gdb) p priv->agent > $71 =*(qemuAgentPtr) 0x0* > (gdb) s > 1138 refs = qemuAgentUnref(priv->agent); > (gdb) s > qemuAgentUnref (mon=0x0) at qemu/qemu_agent.c:168 > (gdb) s > 170 VIR_DEBUG("%d", mon->refs); > (gdb) s > 168 { > (gdb) s > 169 mon->refs--; > (gdb) s > > Program received signal SIGSEGV, Segmentation fault. > qemuAgentUnref (*mon=0x0*) at qemu/qemu_agent.c:169 > 169 *mon->refs--*; > (gdb) s > virNetServerFatalSignal (sig=11, siginfo=0x7f4cf748f630, > context=0x7f4cf748f500) at rpc/virnetserver.c:296 > > > In addition, the old qemuAgentUnref(mon) hasn't judge whether its > parameter is NULL then will deref a NULL pointer, I should simply > fix it in qemuAgentUnref(), for example, if 'mon' is NULL then > directly return. > > Fortunately, your commit b57ee09 potentially fix this issue via > using virObjectUnref() instead of qemuAgentUnref(), if the parameter > 'priv->agent' is NULL then the virObjectUnref(priv->agent) will > directly return false: That is just lucky and we should not rely on that. There is still a bug here. The 'priv->agent' field should *not* be set to NULL in the first place, while a monitor command is active. The GDB trace above does not show *what* is setting priv->agent to NULL, and that is what we need to find out. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list