On 11/01/2012 10:22 AM, Peter Krempa wrote: > When pausing the guest while migration is running (to speed up > convergence) the virDomainSuspend API checks if the migration job is > active before entering the job. This could cause a possible race if the > virDomainSuspend is called while the job is active but ends before the > Suspend API enters the job (this would require that the migration is > aborted). This would cause a incorrect event to be emitted. Not just aborted, but also completed normally. At any rate, I agree that we shouldn't be relying on the state of job.asyncJob until we have obtained the sync job. > --- > src/qemu/qemu_driver.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index 7dedb96..98dc441 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -1658,14 +1658,6 @@ static int qemudDomainSuspend(virDomainPtr dom) { > > priv = vm->privateData; > > - if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_OUT) { > - reason = VIR_DOMAIN_PAUSED_MIGRATION; > - eventDetail = VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED; > - } else { > - reason = VIR_DOMAIN_PAUSED_USER; > - eventDetail = VIR_DOMAIN_EVENT_SUSPENDED_PAUSED; > - } > - > if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_SUSPEND) < 0) > goto cleanup; > > @@ -1675,6 +1667,14 @@ static int qemudDomainSuspend(virDomainPtr dom) { > goto endjob; > } > > + if (priv->job.asyncJob == QEMU_ASYNC_JOB_MIGRATION_OUT) { > + reason = VIR_DOMAIN_PAUSED_MIGRATION; > + eventDetail = VIR_DOMAIN_EVENT_SUSPENDED_MIGRATED; > + } else { > + reason = VIR_DOMAIN_PAUSED_USER; > + eventDetail = VIR_DOMAIN_EVENT_SUSPENDED_PAUSED; > + } > + I think we're missing something. The idea here is that if the user starts an async job, then they can request pause but not resume until that async job has completed. How do we make sure resume is blocked? By setting reason to something other than VIR_DOMAIN_PAUSED_USER. The existing code worked for outgoing live migration by setting up the async job, then if the user requests pause, sets things to the same state as if the user had requested paused migration in the first place. So in a similar vein - if the user requests live snapshot, but then changes their mind and pauses, we should set things to the same state as if they had requested paused snapshot in the first place. [time passes, as well as an IRC chat...] Oh, you did exactly that in patch 10/20! In that case, this code is fixing a pre-existing bug, in preparation for the work in patch 10. ACK as-is. -- Eric Blake eblake@xxxxxxxxxx +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list