On 08/01/2014 03:12 AM, Sam Bobroff wrote: > On 30/07/14 01:52, Ján Tomko wrote: >> On 07/29/2014 02:41 AM, Sam Bobroff wrote: >>> During a QEMU live migration several warning messages about job >>> handling could be written to syslog on the destination host: >>> >>> "entering monitor without asking for a nested job is dangerous" >>> >>> The messages are written because the job handling during migration >>> uses hard coded asyncJob values in several places that are incorrect. >>> >>> This patch passes the required asyncJob value around and prevents >>> the warnings as well as any issues that the warnings may be referring >>> to. >>> >>> Signed-off-by: Sam Bobroff <sam.bobroff@xxxxxxxxxxx> >>> -- >> >> This patch seems to fix the deadlock that can happen if the migrated domain is >> destroyed at the destination reported here: >> https://www.redhat.com/archives/libvir-list/2014-May/msg00236.html >> >> It looks good to me, but it seems there are more functions calling >> qemuDomainObjEnterMonitor that can be called from qemuProcessStart, >> for example qemuDomainChangeGraphicsPasswords. > > Yes, I was fairly sure there would be other cases; I just fixed all the > ones that actually occurred during my tests. > > In fact it seems like for the cases I'm looking at here, where it's the > async job owner thread that's using the EnterMonitor functions, passing > asyncJob around is a waste of time anyway because we know the correct > value of asyncJob to use: it's stored in priv->job.asyncJob. > > Why not have qemuDomainObjEnterMonitorInternal() automatically switch to > creating a nested job in this case? > > It seems easy to do and would simplify some code as well; what do you think? We've had it that way before - it didn't work that well: http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=193cd0f3 > >>> @@ -2505,7 +2506,7 @@ qemuDomainUpdateDeviceList(virQEMUDriverPtr driver, >>> if (!virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_DEVICE_DEL_EVENT)) >>> return 0; >>> >>> - qemuDomainObjEnterMonitor(driver, vm); >>> + ignore_value(qemuDomainObjEnterMonitorAsync(driver, vm, asyncJob)); >> >> Also, the return value of this call could be dangerous to ignore if asyncJob >> != NONE. > > True, but the patch hasn't introduced this, and the full story is even > worse ;-) > > void qemuDomainObjEnterMonitor(virQEMUDriverPtr driver, > virDomainObjPtr obj) > { > ignore_value(qemuDomainObjEnterMonitorInternal(driver, obj, > QEMU_ASYNC_JOB_NONE)); > } > qemuDomainObjEnterMonitorInternal is called with QEMU_ASYNC_JOB_NONE here. It always returns 0 in that case and it's safe to ignore. The problem is when you use other async jobs: static int qemuDomainObjEnterMonitorInternal(virQEMUDriverPtr driver, virDomainObjPtr obj, qemuDomainAsyncJob asyncJob) { qemuDomainObjPrivatePtr priv = obj->privateData; if (asyncJob != QEMU_ASYNC_JOB_NONE) { int ret; if ((ret = qemuDomainObjBeginNestedJob(driver, obj, asyncJob)) < 0) return ret; if (!virDomainObjIsActive(obj)) { virReportError(VIR_ERR_OPERATION_FAILED, "%s", _("domain is no longer running")); /* Still referenced by the containing async job. */ ignore_value(qemuDomainObjEndJob(driver, obj)); return -1; } } ... Jan
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list