On 08.11.2012 11:49, Jiri Denemark wrote: > On Thu, Nov 08, 2012 at 11:00:40 +0100, Michal Privoznik wrote: >> On 07.11.2012 23:23, Jiri Denemark wrote: >>> >>> AbortJob API was never a synchronous one and I don't think we need to change >>> it. Not to mention that this code unlocks and locks vm without holding an >>> extra reference to it. And even if it did so, it cannot guarantee that the job >>> it's checking in the loop is the same one it tried to cancel. It's quite >>> unlikely but the original job could have finished and another one could have >>> been started while this code was sleeping. >> >> However, AbortJob is documented as synchronous. If current implementation doesn't >> follow docs it is a bug. On the other hand, I don't recall anybody screaming about >> it so far. But that means nothing, right? > > IIRC the implementation was never synchronous in which case I think we may > just fix the documentation to match reality :-) > >>>> @@ -1172,6 +1172,12 @@ qemuMigrationUpdateJobStatus(struct qemud_driver *driver, >>>> } >>>> priv->job.info.timeElapsed -= priv->job.start; >>>> >>>> + if (qemuDomainObjAbortAsyncJobRequested(vm)) { >>>> + VIR_DEBUG("Migration abort requested. Translating " >>>> + "status to MIGRATION_STATUS_CANCELLED"); >>>> + status = QEMU_MONITOR_MIGRATION_STATUS_CANCELLED; >>>> + } >>>> + >>> >>> This check seems to be to late and that complicates the code later. If we keep >>> qemuDomainAbortJob calling qemuMonitorMigrateCancel in qemuDomainAbortJob, we >>> may count on that and check priv.job.asyncAbort before entering the monitor >>> to tell QEMU to start migrating in qemuMigrationRun. If the abort is not >>> requested by then, it may only happen after we call qemuMonitorMigrateTo* and >>> in that case the migration will be properly aborted by >>> qemuMonitorMigrateCancel. >>> >> >> Not really. The issue I've seen was like this: >> >> Thread A Thread B >> 1. start async migration out job >> 2. Since job is set, abort it by calling 'migrate_cancel' >> 3. return to user >> 4. issue 'migrate' on the monitor >> >> And I think your suggestion makes it just harder to hit this race and not really avoid it. >> However with my code, we are guaranteed that 'migrate_cancel' will have an effect. > > Oh right, we actually need to check priv.job.asyncAbort after entering the > monitor (since we may be preempted while waiting for entering the monitor) but > still before calling qemuMonitorMigrateTo* that actually starts the monitor. > When we entered the monitor, the AbortJob must have already been there or it > will come after we leave the monitor. > > Jirka > Yeah, now that I've traced all possibilities on a paper I see my picture of situation wasn't quite correct as well. I'll post v2 shortly. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list