On 12/14/2011 03:29 AM, Jiri Denemark wrote: > If an async job run on a domain will stop the domain at the end of the > job, a concurrently run query job can hang in qemu monitor nothing can > be done with that domain from this point on. An attempt to start such > domain results in "Timed out during operation: cannot acquire state > change lock" error. I think I've actually hit this once or twice, even as hard as it is to line up the pre-conditions :) > > However, quite a few things have to happen at the right time... There > must be an async job running, which stops a domain at the end. This race > was with dump --crash but other similar jobs, such as (managed)save and > migration, should be able to trigger this bug as well. While this async > job is processing its last monitor command, that is a query-migrate to > which qemu replies with status "completed", a new libvirt API that > results in a query job must arrive and stay waiting until the > query-migrate command finishes. Once query-migrate is done but before > the async job closes qemu monitor while stopping the domain, the other > thread needs to wake up and call qemuMonitorSend to send its command to > qemu. Before qemu gets a chance to respond to this command, the async > job needs to close the monitor. At this point, the query job thread is > waiting for a condition that no-one will ever signal so it never > finishes the job. The explanation's longer than the patch! But that's the key to understanding things, and you did a good job. > --- > src/qemu/qemu_monitor.c | 14 ++++++++++++++ > 1 files changed, 14 insertions(+), 0 deletions(-) > > diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c > index 4141fb7..35648ae 100644 > --- a/src/qemu/qemu_monitor.c > +++ b/src/qemu/qemu_monitor.c > @@ -750,6 +750,20 @@ void qemuMonitorClose(qemuMonitorPtr mon) > VIR_FORCE_CLOSE(mon->fd); > } > > + /* In case another thread is waiting for its monitor command to be > + * processed, we need to wake it up with appropriate error set. > + */ > + if (mon->msg) { > + if (mon->lastError.code == VIR_ERR_OK) { > + qemuReportError(VIR_ERR_OPERATION_FAILED, > + _("Qemu monitor was closed")); > + virCopyLastError(&mon->lastError); > + virResetLastError(); > + } > + mon->msg->finished = 1; > + virCondSignal(&mon->notify); > + } ACK. -- Eric Blake eblake@xxxxxxxxxx +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list