Re: [PATCH] Fix race condition when destroying guests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 18, 2013 at 02:39:11PM +0000, Daniel P. Berrange wrote:
> From: "Daniel P. Berrange" <berrange@xxxxxxxxxx>
> 
> When running virDomainDestroy, we need to make sure that no other
> background thread cleans up the domain while we're doing our work.
> This can happen if we release the domain object while in the
> middle of work, because the monitor might detect EOF in this window.
> For this reason we have a 'beingDestroyed' flag to stop the monitor
> from doing its normal cleanup. Unfortunately this flag was only
> being used to protect qemuDomainBeginJob, and not qemuProcessKill
> 
> This left open a race condition where either libvirtd could crash,
> or alternatively report bogus error messages about the domain already
> having been destroyed to the caller
> 
> Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx>
> ---
>  src/qemu/qemu_driver.c | 19 ++++++++++++-------
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
> index c28c223..2a6f381 100644
> --- a/src/qemu/qemu_driver.c
> +++ b/src/qemu/qemu_driver.c
> @@ -2121,24 +2121,29 @@ qemuDomainDestroyFlags(virDomainPtr dom,
>  
>      qemuDomainSetFakeReboot(driver, vm, false);
>  
> +
> +    /* We need to prevent monitor EOF callback from doing our work (and sending
> +     * misleading events) while the vm is unlocked inside BeginJob/ProcessKill API
> +     */
> +    priv->beingDestroyed = true;
> +
>      /* Although qemuProcessStop does this already, there may
>       * be an outstanding job active. We want to make sure we
>       * can kill the process even if a job is active. Killing
>       * it now means the job will be released
>       */
>      if (flags & VIR_DOMAIN_DESTROY_GRACEFUL) {
> -        if (qemuProcessKill(driver, vm, 0) < 0)
> +        if (qemuProcessKill(driver, vm, 0) < 0) {
> +            priv->beingDestroyed = false;
>              goto cleanup;
> +        }
>      } else {
> -        if (qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE) < 0)
> +        if (qemuProcessKill(driver, vm, VIR_QEMU_PROCESS_KILL_FORCE) < 0) {
> +            priv->beingDestroyed = false;
>              goto cleanup;
> +        }
>      }
>  
> -    /* We need to prevent monitor EOF callback from doing our work (and sending
> -     * misleading events) while the vm is unlocked inside BeginJob API
> -     */
> -    priv->beingDestroyed = true;
> -
>      if (qemuDomainObjBeginJobWithDriver(driver, vm, QEMU_JOB_DESTROY) < 0)
>          goto cleanup;
>  
> -- 
> 1.8.0.2

This is a really difficult bug to reproduce using libguestfs.  I've
only reproduced it a handful of times in about 3 months.

I would say, push it, based on:

(1) Rationale and patch looks sane.

(2) It fixes your minimal test case (which is a strong enough reason in
itself).

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list


[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]