On 07/28/2011 03:13 AM, Wen Congyang wrote:
Current, we support run sync job and async job at the same time. It means that the monitor commands for two jobs can be run in any order. In the function qemuDomainObjEnterMonitorInternal(): if (priv->job.active == QEMU_JOB_NONE&& priv->job.asyncJob) { if (qemuDomainObjBeginNestedJob(driver, obj)< 0) We check whether the caller is an async job by priv->job.active and priv->job.asynJob. But when an async job is running, priv->job.active is QEMU_JOB_NONE if no sync job is running, or priv->job.active is not QEMU_JOB_NONE if a sync job is running. So we cannot check whether the caller is an async job in the function qemuDomainObjEnterMonitorInternal(). Unfortunately, if sync job and async job are running at the same time, we may try to send monitor command at the same time in two threads. It's very dangerous, and it will cause libvirtd crashed.
Thanks for the analysis. I think you are spot on as to the problem's root cause - we are not properly serializing access to the monitor when mixing a sync and an async job.
However, I'm not yet convinced that adding yet another new condvar is the solution. Rather, your idea of adding qemuDomainObjEnterMonitorWithDriverAsync() might be the way to go. And as for existing functions that can be used either by the async job or by a sync job (qemuProcessStopCPUs), I think that merely means that qemuDomainObjEnterMonitorWithDriverAsync is passed either QEMU_ASYNC_JOB_NONE (sync job request) or the current async job, and that qemuDomainObjEnterMonitorWithDriver becomes shorthand for qemuDomainObjEnterMonitorWithDriver(,QEMU_ASYNC_JOB_NONE).
I'll try a patch along those lines, but may end up going with yours after all.
-- Eric Blake eblake@xxxxxxxxxx +1-801-349-2682 Libvirt virtualization library http://libvirt.org -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list