Re: [PATCH] Allow destroying QEMU VM even if a job is active

Eric Blake <eblake@xxxxxxxxxx> · Fri, 06 May 2011 08:34:43 -0600

On 05/06/2011 05:57 AM, Daniel P. Berrange wrote:
> Introduce a virProcessKill function that can be safely called
> even when the job mutex is held. This allows virDomainDestroy
> to kill any VM even if it is asleep in a monitor job. The PID
> will die and the thread asleep on the monitor will then wake
> up releasing the job mutex.

Very nice.  This will help issues where if you kill a migration at the
source, then the destination is left with a stale VM with the job lock
held waiting for the incoming migration to complete.  With this, you now
have the means to forcibly kill the leftover VM on the destination,
rather than waiting for a network timeout.

> +++ b/src/qemu/qemu_driver.c
> @@ -1482,6 +1482,13 @@ static int qemudDomainDestroy(virDomainPtr dom) {
>          goto cleanup;
>      }
>  
> +    /* Although qemuProcessStop does this already, there may
> +     * be an outstanding job active. We want to make sure we
> +     * can kill the process even if a job is active. Killing
> +     * it now, means the job will be released

s/now,/now/

> +     */
> +    qemuProcessKill(vm);
> +
>      if (qemuDomainObjBeginJobWithDriver(driver, vm) < 0)
>          goto cleanup;
>  
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index a6c0dc8..c60c51f 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -2369,6 +2369,36 @@ cleanup:
>  }
>  
>  
> +void qemuProcessKill(virDomainObjPtr vm)

Do you want this to return -1 on failure to kill, such as if waitpid()
fails?  On the other hand, I guess that best effort is okay; you already
did VIR_DEBUG logging of failures, and the caller can't do much even if
they were to check for a failure.

> +{
> +    int i;
> +    int rc;
> +    VIR_DEBUG("vm=%s pid=%d", vm->def->name, vm->pid);
> +
> +    if (!virDomainObjIsActive(vm)) {
> +        VIR_DEBUG("VM '%s' not active", vm->def->name);
> +        return;
> +    }
> +
> +    for (i = 0 ; i < 15 ; i++) {
> +        int signum;
> +        if (i == 0)
> +            signum = SIGTERM;
> +        else if (i == 8)
> +            signum = SIGKILL;
> +        else
> +            signum = 0; /* Just check for existance */

s/existance/existence/

It took me three reads to figure out what this loop is doing; a comment
would be helpful:

/* Try SIGTERM, then wait 1.6 seconds, then try SIGKILL, all while
probing every 200ms to see if the process is gone */

> +
> +        rc = virKillProcess(vm->pid, signum);
> +        VIR_DEBUG("Iteration %d rc=%d", i, rc);
> +        if (rc < 0)
> +            break;

When errno==ESRCH, this breaks the loop because the child is dead.  But
for any other errno, should you log (at least at VIR_DEBUG) the errno
that causes failure, since that means we didn't exit the loop normally?

> +
> +        usleep(200 * 1000);
> +    }
> +}
> +

ACK with those nits addressed.

-- 
Eric Blake   eblake@xxxxxxxxxx    +1-801-349-2682
Libvirt virtualization library http://libvirt.org

Attachment:
signature.asc

Description: OpenPGP digital signature
--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list