Re: An implementation bug in qemuMigrationWaitForCompletion that introduces an unnecessary sleep of 50 ms when a migration completes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10.04.2015 00:32, Xing Lin wrote:
> Hi,
> 
> It seems that I found a bug in the qemuMigrationWaitForCompletion(). It
> will do a sleep of 50 ms, even when qemuMigrationUpdateJobStatus() detects
> that a migration has completed.
> 
> 
> Here is the original implementation of the while loop.
> 
> 
>    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
>         /* Poll every 50ms for progress & to allow cancellation */
>         struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull
> };
> 
>         if (qemuMigrationUpdateJobStatus(driver, vm, job, asyncJob) == -1)
>             break;
> 
>         /* cancel migration if disk I/O error is emitted while migrating */
>         if (abort_on_error &&
>             virDomainObjGetState(vm, &pauseReason) == VIR_DOMAIN_PAUSED &&
>             pauseReason == VIR_DOMAIN_PAUSED_IOERROR) {
>             virReportError(VIR_ERR_OPERATION_FAILED,
>                            _("%s: %s"), job, _("failed due to I/O error"));
>             break;
>         }
> 
>         if (dconn && virConnectIsAlive(dconn) <= 0) {
>             virReportError(VIR_ERR_OPERATION_FAILED, "%s",
>                            _("Lost connection to destination host"));
>             break;
>         }
> 
>         virObjectUnlock(vm);
> 
>         nanosleep(&ts, NULL);
> 
>         virObjectLock(vm);
>     }
> 
> 
> Even when qemuMigrationUpdateJobStatus() detects a migration has completed,
> the nanosleep() will be called again. When the while condition is checked,
> then the program breaks from the while loop. This introduces an unnecessary
> sleep of 50 ms which can be avoided by doing a sleep first. The code should
> likely be structured as following.
> 
>    while (jobInfo->type == VIR_DOMAIN_JOB_UNBOUNDED) {
>         /* Poll every 50ms for progress & to allow cancellation */
>         struct timespec ts = { .tv_sec = 0, .tv_nsec = 50 * 1000 * 1000ull
> };
> 
>         // do sleep here.
>         virObjectUnlock(vm);
> 
>         nanosleep(&ts, NULL);
> 
>         virObjectLock(vm);

Yeah, I guess it's very unlikely that migration will complete within
first 50ms after issuing the command on the monitor (in which case we
would again wait pointlessly). The other approach would be to leave the
sleep() where it is, but enclose it in if (jobInfo->type ==
VIR_DOMAIN_JOB_UNBOUNDED) {}. I have no preference over these two
versions though. So ACK to the patch of yours. However, I'll postpone
merging the patch for a while and let others express their feelings.

Michal

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]